ccmbioinfo / MetaFusion

GNU Lesser General Public License v3.0
8 stars 2 forks source link

Is there a way to return the transcript_id that metafusion is using? #11

Closed pintoa1-mskcc closed 1 year ago

pintoa1-mskcc commented 1 year ago

I would like to return the assumed transcript_id for each gene name in a fusion, I can see metafusion is selecting a transcript_id to use: https://github.com/ccmbioinfo/MetaFusion/blob/81df5123ffca922f3c35f0639c640c14badde40e/scripts/pygeneann_MetaFusion.py#L1207

I can figure out that metafusion is selecting a transcript_id for each, but I cannot figure out how it is deciding which transcript_id is the proper one.

Thanks!

mike8115 commented 1 year ago

That function seems to be only used in one function. Based on the comments, it looks like every head/tail pairing is tested to check for valid codons.

https://github.com/ccmbioinfo/MetaFusion/blob/81df5123ffca922f3c35f0639c640c14badde40e/scripts/pygeneann_MetaFusion.py#L1305-L1306

A bit farther down in the function, there's a comment about which pair is chosen from all the candidates.

https://github.com/ccmbioinfo/MetaFusion/blob/81df5123ffca922f3c35f0639c640c14badde40e/scripts/pygeneann_MetaFusion.py#L1358-L1360

Instead of keeping only the top candidate, you could always save all values of infered_fusion_seq_info.

https://github.com/ccmbioinfo/MetaFusion/blob/81df5123ffca922f3c35f0639c640c14badde40e/scripts/pygeneann_MetaFusion.py#L1408-L1413

Hopefully that helps!

pintoa1-mskcc commented 1 year ago

So when I was going through the code, I found that check_codon was never called. The line which calls it is currently commented out. https://github.com/ccmbioinfo/MetaFusion/blob/81df5123ffca922f3c35f0639c640c14badde40e/scripts/reann_cff_fusion.py#L36

From what I can see, metafusion is selecting a transcript during the scoring portion of _check_gene_pairs https://github.com/ccmbioinfo/MetaFusion/blob/81df5123ffca922f3c35f0639c640c14badde40e/scripts/pygeneann_MetaFusion.py#L874-L903

i managed to get metafusion to return the highest scoring Transcript, however, I did notice that if two transcripts score the same, metafusion will pick the first one it encounters. Am I understanding that properly?

Is there a way to make metafusion select the canonical transcript if the score's are equivalent?

pintoa1-mskcc commented 1 year ago

Also, is line 877 above a bug? score1 == max_t2[0]? Should this be score1 == max_t1[0]

mike8115 commented 1 year ago

Sorry for the delay!

Honestly, you probably need to make extensive changes to get the canonical transcript. I've been told the following:

Personally I don’t think it is possible to get the transcript ID with certainty from the output of the fusion callers alone unless they proved the transcript IDs already (which, if I remember correctly, they do not). The best we could do is get a set of transcript IDs that are compatible.

Line 877 looks like a bug to me. Based on the context, I would expect that to affect the final outputs, but I'm not sure.

pintoa1-mskcc commented 1 year ago

Understood! For now it is enough that metafusion is returning a compatible transcript (in my fork at least haha)