Magdoll / cDNA_Cupcake

Miscellaneous collection of Python and R scripts for processing Iso-Seq data
BSD 3-Clause Clear License
257 stars 102 forks source link

Some question about the output if fusion_collate_info.py. #150

Open 1398206876 opened 3 years ago

1398206876 commented 3 years ago

Hi Liz,

I'm using fusion_finder.py to identify a fusion gene in a allopolyploid,It is known that there are some fusion genes generated by homologous exchange in this allopolyploid. Can the fusion genes composed of two homologous genes with only a few base differences be found through the workflow in your wiki?

And after I run the fusion_collate_info.py,I got two output files prefix.annotated.txt & prefix.annotated_ignored.txt Is there a standard to distinguish these two files?Thank you!

Thank you for your attention, Best wishes!

Magdoll commented 3 years ago

Hi @1398206876 ,

fusion_finder.py only cares about exonic structures. If there are only a few base difference but doesn't really change the exonic structure (# of exons, junctions, etc) it'll report that as a single fusion event.

However, I can think of ways to identify the fusion alleles.

One way is to run them later through IsoPhase.

Minor modifications may be needed - becuz IsoPhase expects one gene locus = one IsoPhase run. But since fusions consists of two or more genes, if you run the standard IsoPhase, you will get individual phasing results for different gene components. I haven't tried this yet myself, but theoretically you can make a "fake" genome for each fusion that consists of all the gene components of that fusion. But it requires making a fake genome + altering the fusion transcript coordinates. Let me know if you want to explore this option. It is not difficult to do, just a lot of annoying scripting.

re: annotated v annotated_ignored --- the ignored file consists of "filtered" fusions, meaning fusions that are filtered for a variety of reasons, such as one or both genes are novel (unannoated), not enoough read support, etc

-Liz

1398206876 commented 3 years ago

@Magdoll Thank you for your prompt reply and your answer is very helpful.I'm going to try some other ways further, including the one you mentioned.