OpenGene / GeneFuse

Gene fusion detection and visualization
MIT License
114 stars 62 forks source link

No fusion found using tests data #16

Open chizhenfen opened 5 years ago

chizhenfen commented 5 years ago

Hi Shifu, no fusion found using tests data ./genefuse -r Homo_sapiens_assembly19.fasta -f druggable.hg19.csv -1 R1.fq.gz -2 R2.fq.gz -h r1r2n.html 15:51:11 start with 4 threads 15:51:50 mapper indexing done 15:52:20 sequence number before filtering: 0 15:52:20 removeByComplexity: 0 15:52:20 removeByDistance: 0 15:52:20 removeIndels: 0 15:54:5 matcher indexing done 15:54:5 removeAlignables: 0 15:54:5 found 0 fusions

./genefuse -r Homo_sapiens_assembly19.fasta -f druggable.hg19.csv -1 genefuse.R1.fq.gz -2 genefuse.R2.fq.gz -h genefuser1r2n.html 15:55:45 start with 4 threads 15:56:25 mapper indexing done 15:56:36 sequence number before filtering: 0 15:56:36 removeByComplexity: 0 15:56:36 removeByDistance: 0 15:56:36 removeIndels: 0 15:58:25 matcher indexing done 15:58:25 removeAlignables: 0 15:58:25 found 0 fusions

Dataset was downloaded from: http://opengene.org/dataset.html

Thanks.

sfchen commented 5 years ago

I just tried again with command: ./genefuse -r ~/data/ref/hg19.fa -1 ~/data/fq/genefuse.R1.fq.gz -2 ~/data/fq/genefuse.R2.fq.gz -h test.html -j test.json -f genes/druggable.hg19.csv

and got:

15:1:4 start with 4 threads
15:1:47 mapper indexing done
15:2:38 sequence number before filtering: 1329
15:2:38 removeByComplexity: 0
15:2:38 removeByDistance: 39
15:2:38 removeIndels: 67
15:4:3 matcher indexing done
15:4:3 removeAlignables: 8

Probably you used incorrect reference? I used hg19 downloaded from UCSC. Did you checked the downloaded files using MD5?

dickyornot commented 5 years ago

Hi sfchen,

I have the same problem. I used all the demo files you provided including the reference genome. but I still got nothing in my result.

MarcHiggins commented 5 years ago

Hi sfchen,

I have experienced the same problem as others have reported here. Have you gotten to the bottom of this?

Thanks.

sfchen commented 5 years ago

Can you guys check md5 for the downloaded FASTQ file?

sfchen commented 5 years ago

http://opengene.org/dataset.html

You should download following files: Paired-end FASTQ files for GeneFuse testing (Illumina platform) genefuse.R1.fq.gz (size: 62 M, MD5: 171e6dfa0af37fe95c826005bc5fcdf9) genefuse.R2.fq.gz (size: 66 M, MD5: e756cf01e256186dccaa9e700d85a342)

MarcHiggins commented 5 years ago

Hi sfchen,

Yes those are the same md5 I get when I check on the downloaded FASTQs. The command I run is: ./genefuse -r Homo_sapiens_assembly19.fasta -f druggable.hg19.csv -1 genefuse.R1.fq.gz -2 genefuse.R2.fq.gz -h report.html >result

I have downloaded the .fasta file from ensembl.

Thanks.

sfchen commented 5 years ago

The druggable.hg19.csv is in the genes folder

Have you checked the error message?

sfchen commented 5 years ago

I mean, you should run:

./genefuse -r Homo_sapiens_assembly19.fasta -f genes/druggable.hg19.csv -1 genefuse.R1.fq.gz -2 genefuse.R2.fq.gz -h report.html >result

MarcHiggins commented 5 years ago

I have downloaded via wget the druggable.hg.csv from the genes folder. In the results document there are no reported errors

sfchen commented 5 years ago

Errors are saved to STDERR, not STDOUT. So you cannot find errors in the result file.

Can you just run the command without redirecting to result?

MarcHiggins commented 5 years ago

I do not get any STDERR or STDOUT files regardless of if I redirect to result or not. I am running the binary if this may make a difference. Thank you for your help by the way.

MarcHiggins commented 5 years ago

Apologies I meant I do not get an STDOUT file at all.

sfchen commented 5 years ago

You used >, which redirected STDOUT to the file you specified.

MarcHiggins commented 5 years ago

But even if I exclude > there is no STDERR file - that is what I meant not the lack of STDOUT apologies for confusion.

sfchen commented 5 years ago

You didn't redirect STDERR, so it would be printed on terminal.

You can use following command to also redirect STDERR:

./genefuse -r Homo_sapiens_assembly19.fasta -f druggable.hg19.csv -1 genefuse.R1.fq.gz -2 genefuse.R2.fq.gz -h report.html >result & 2>err.log

MarcHiggins commented 5 years ago

I have ran again and no errors are reported. I notice however the FASTQ files don't have the 15 million lines you mention in a different thread - they have more like 800,000. Maybe this is the issue?

MarcHiggins commented 5 years ago

Hi sfchen, I have again ran genefuse on fastqs which I know to contain translocations. Your software did not call these. This is more just to let you know than a specific request or question.

sfchen commented 5 years ago

Thanks, I will try to reproduce it.

tkcaccia commented 4 years ago

Hi sfchen,

I am in the same situation. I do not find genefusion in the test dataset.

amacbride commented 2 years ago

I found the answer in https://github.com/OpenGene/GeneFuse/issues/31 -- the NCBI version of hg19 had a different chromosome naming convention, so it doesn't work. The version downloadable from UCSC is fine:

wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz (then unzip)