Genomon-Project / fusionfusion

script for detecting fusion genes from several transcript alignment tools
GNU General Public License v3.0
9 stars 8 forks source link

Missing all IGH fusions #3

Open ndaniel opened 6 years ago

ndaniel commented 6 years ago

It looks like all IGH fusions are missed. IGH fusions are for example found in lymphoblastic leukemias.

For example, the IGH-DUX4 fusion is missed in NALM-6 cell line (using this RNA-seq data from CCLE https://gdc-portal.nci.nih.gov/legacy-archive/files/6fa77b04-bb16-49c5-8033-79dd76860c97 ).

friend1ws commented 6 years ago

Thank you very much for the interest in our software. Yes, IGH-DUX4 fusion is one of few example which our approach miss. Our approach accepts list of of chimeric reads generated by aligner (e.g., .Chimeric.out.sam by STAR), and filter them to identify highly reliable fusions (so mostly similar to the approach by STAR-fusion). Therefore, actually, there are no chimeric reads supporting IGH-DUX4 fusions at the stage of .Chimeric.out.sam. (when using STAR), and IGH-DUX4 cannot be found by our software.

I guess either IGH or DUX4 is highly repetitive sequence and STAR miss to accurately align short reads covering these genes...

I'm considering resolve this issue by other approaches.

ndaniel commented 6 years ago

IGH-DUX4 fusion is one of few example which our approach miss.

Our there are more than 40 known IGH fusions (see: http://atlasgeneticsoncology.org/Genes/GC_IGH.html ) so therefore a lot of fusion genes that are missed! Also I would guess that also fusion CIC-DUX4 is missed too (see: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0099439 )

STAR and STAR-fusion are known to miss almost all IGH fusions. Out there are fusion callers that are very well known to be able to find IGH fusions in RNA-seq data, like for example CICERO and FusionCatcher.

I guess either IGH or DUX4 is highly repetitive sequence and STAR miss to accurately align short reads covering these genes...

I guess that the IGH-DUX4 fusion is missed because all of these three reasons together:

Here is a small FASTQ files test for fusions which contains 17 known fusions and it can be used to asses quickly what fusions are missed by a fusion caller: https://sourceforge.net/projects/fusioncatcher/files/test/