Closed pintoa1-mskcc closed 1 year ago
That function seems to be only used in one function. Based on the comments, it looks like every head/tail pairing is tested to check for valid codons.
A bit farther down in the function, there's a comment about which pair is chosen from all the candidates.
Instead of keeping only the top candidate, you could always save all values of infered_fusion_seq_info
.
Hopefully that helps!
So when I was going through the code, I found that check_codon was never called. The line which calls it is currently commented out. https://github.com/ccmbioinfo/MetaFusion/blob/81df5123ffca922f3c35f0639c640c14badde40e/scripts/reann_cff_fusion.py#L36
From what I can see, metafusion is selecting a transcript during the scoring portion of _check_gene_pairs https://github.com/ccmbioinfo/MetaFusion/blob/81df5123ffca922f3c35f0639c640c14badde40e/scripts/pygeneann_MetaFusion.py#L874-L903
i managed to get metafusion to return the highest scoring Transcript, however, I did notice that if two transcripts score the same, metafusion will pick the first one it encounters. Am I understanding that properly?
Is there a way to make metafusion select the canonical transcript if the score's are equivalent?
Also, is line 877 above a bug? score1 == max_t2[0]
? Should this be score1 == max_t1[0]
Sorry for the delay!
Honestly, you probably need to make extensive changes to get the canonical transcript. I've been told the following:
Personally I don’t think it is possible to get the transcript ID with certainty from the output of the fusion callers alone unless they proved the transcript IDs already (which, if I remember correctly, they do not). The best we could do is get a set of transcript IDs that are compatible.
Line 877 looks like a bug to me. Based on the context, I would expect that to affect the final outputs, but I'm not sure.
Understood! For now it is enough that metafusion is returning a compatible transcript (in my fork at least haha)
I would like to return the assumed transcript_id for each gene name in a fusion, I can see metafusion is selecting a transcript_id to use: https://github.com/ccmbioinfo/MetaFusion/blob/81df5123ffca922f3c35f0639c640c14badde40e/scripts/pygeneann_MetaFusion.py#L1207
I can figure out that metafusion is selecting a transcript_id for each, but I cannot figure out how it is deciding which transcript_id is the proper one.
Thanks!