WGLab / LongGF

A computational algorithm and software tool for fast and accurate detection of gene fusion by long-read transcriptome sequencing
GNU General Public License v3.0
22 stars 1 forks source link

Does LongGF ignore contigs in the reference not prefixed with "chr"? #8

Open oneillkza opened 3 years ago

oneillkza commented 3 years ago

Hi,

We're trying to call fusion transcripts in cervical cancers with HPV, using a reference we created by adding various HPV strains to hg38. When we run LongGF on data aligned to this reference, using a combined hg38/HPV gtf file, it calls fusions between human genes, but ignores the HPV-containing reads. (We can definitely see fusion reads between the HPV and hg38 genes when we look in IGV).

The HPV "chromosomes" all have names like "HPV16", "HPV18", etc. Does LongGF ignore genes on chromosomes that don't start with the string "chr"? If so, how could we override that behaviour?

Thanks!

liuqianhn commented 3 years ago

@oneillkza LongGF does not ignore genes/chromosome without starting chr, but will not consider chromosome whose name contains _. If you suspect chr will be the issue, could you please change HPV16 or HPV18 to chr26 or chr28 to see what happen. Please note that the chromosomes in bam and in gtf are need to have similar naming strategies if the chromosome names do not start with 'chr'. If a chromosome name starts with chr, LongGF will try to match them with both chromosome with and without chr: that is chr20 is same as 20 in LongGF. If you do not mind, you can generate small datasets of bam/gtf and share with me via liuqianhn@gmail.com so that I can debug it.

oneillkza commented 3 years ago

Thanks @liuqianhn. Unfortunately our data is patient data, so I can't share the bam (even a subset of it). But I can see if I can make some simulated data that behaves the same. I can share the gtf, however.