bioinformatics-centre / bayesembler

A Bayesian method for doing transcriptome assembly from RNA-seq data
MIT License
25 stars 5 forks source link

error running bayesembler 1.2.0 #11

Open bosmont opened 9 years ago

bosmont commented 9 years ago

I downloaded bayesembler 1.2.0 binary for linux, and run it on one of tophat2 generated bam files as follow:

bayesembler -b ~/work/ngs_dat/project1/tophat2/sample1/accepted_hits.bam -o test

But got following error:

You are using the Bayesembler v1.2.0. For more information go to bayesembler.binf.ku.dk

bam_nd_pe_plus_file_nametestaccepted_hits_nd_plus.bam [06/05/2015 09:00:31] Removing duplicate reads [06/05/2015 09:33:21] Removed duplicates from 99688126 mapped read pairs [06/05/2015 09:33:21] Wrote 41048494 read pairs used for splice-graph construction

[06/05/2015 09:33:21] Spawning graph construction thread [06/05/2015 09:33:21] Generating splice-graphs from testaccepted_hits_nd_unstranded.bam using cem [06/05/2015 09:59:12] Parsed 1056584 graph(s) from cem instance file

[06/05/2015 09:59:12] Parsed 1056584 splice graph(s) from cem instance file and collapsed them to 15469 assembly graph(s) (1040087 graph(s) excluded due to inference issues resulting from unstranded data). [06/05/2015 09:59:12] 40223160 unique, non-redundant read pairs being used for quantification [06/05/2015 09:59:12] 4.02232e+07 read pairs being used for FPKM normalisation

[06/05/2015 09:59:12] Sorting splice-graphs by read count [06/05/2015 09:59:13] Finished sorting splice-graphs by read count

[06/05/2015 09:59:13] Spawning 1 thread(s) for fetching alignments and 1 i/o thread [06/05/2015 09:59:34] Estimating fragment length distribution from 60 transcripts longer than 2500 nucleotides [06/05/2015 09:59:34] Estimated fragment length "median"=178 and "median absolute deviation"=57 using 48850 observations [06/05/2015 09:59:34] Using Gaussian fragment length distribution with parameters: Mean=178 and SD=84.5082

[06/05/2015 09:59:34] Starting Bayesembler on 11609 multi-path graph(s) and 3860 single-path graph(s) [06/05/2015 09:59:34] Spawning 1 Bayesembler thread(s) and 2 i/o threads

bayesembler: /seqdata/krogh/jola/projects/transcriptome_assembly/code/release/bayesembler_1_2_0/src/assembler.cpp:1411: double Assembler::calculateSequencingProbability(std::string&, std::string&, std::vectorBamTools::CigarOp&): Assertion `*deletions_it == qualities.size() + 1' failed.

What is the problem? How do I fix this? Thank you very much for your help.

lassemaretty commented 9 years ago

Hi bosmont,

Sorry for the late reply. I've quickly looked through the offending part of the code and Im quite sure that your issue it caused by a bug (that shouldn't affect assembly accuracy), but I will need a bit more time to diagnose it (and issue a patch if needed). It would be very helpful if you could make the offending bam-file (or a sample that reproduces the error) available to us for debugging purposes; it would of course be treated confidentially, used solely for testing and removed from our system as soon as the testing is completed. But no worries if this is not possible:-)

/Lasse

bosmont commented 9 years ago

Hi Lasse, Thanks for your response. I tried a few different BAM files, and all got same error. So it looks like it is not specific to any particular BAM file. Could try it with any of your BAM file? Thanks again.

lassemaretty commented 9 years ago

Hi Bosmont, You can try any of the ENCODE datasets used in our paper (e.g. SRA accession SRR387661) mapped with TopHat2 (default parameters), however I would suprised if it would crash on this set as I suspect it to be a bamfile issue. Btw, what TopHat2 parameters have you used for mapping? We actually have a patch for the suspected bug ready now, but I really want to test it on your bamfiles before releasing it. Would you be so kind as to try and run it for us? If you write me an email (my-github-username at binf dot ku dot dk), I will send you the new Linux binary.

/Lasse

bosmont commented 9 years ago

Hi Lasse, I have been using all default tophat2 parameters, except with "--no-coverage-search" turned on.

I generated BAM files from SRR387661 with tophat2 and then run bayesembler, but got same error:

You are using the Bayesembler v1.2.0. For more information go to bayesembler.binf.ku.dk

bam_nd_pe_plus_file_nametestaccepted_hits_nd_plus.bam [14/05/2015 20:35:58] Removing duplicate reads [14/05/2015 21:04:34] Removed duplicates from 103319512 mapped read pairs [14/05/2015 21:04:34] Wrote 32446145 read pairs used for splice-graph construction

[14/05/2015 21:04:34] Spawning graph construction thread [14/05/2015 21:04:34] Generating splice-graphs from testaccepted_hits_nd_unstranded.bam using cem [14/05/2015 21:21:22] Parsed 444805 graph(s) from cem instance file

[14/05/2015 21:21:22] Parsed 444805 splice graph(s) from cem instance file and collapsed them to 13801 assembly graph(s) (430047 graph(s) excluded due to inference issues resulting from unstranded data). [14/05/2015 21:21:22] 31235046 unique, non-redundant read pairs being used for quantification [14/05/2015 21:21:22] 3.1235e+07 read pairs being used for FPKM normalisation

[14/05/2015 21:21:22] Sorting splice-graphs by read count [14/05/2015 21:21:22] Finished sorting splice-graphs by read count

[14/05/2015 21:21:22] Spawning 1 thread(s) for fetching alignments and 1 i/o thread bayesembler: /seqdata/krogh/jola/projects/transcriptome_assembly/code/release/bayesembler_1_2_0/src/assembler.cpp:1411: double Assembler::calculateSequencingProbability(std::string&, std::string&, std::vectorBamTools::CigarOp&): Assertion `*deletions_it == qualities.size() + 1' failed. Aborted (core dumped)

Richard-Watkins commented 9 years ago

Hi,

I am having exactly the same error as reported above when running bayesembler. Did you succeed in creating a patch for the bug?

Thanks in advance for any help, Richard

lassemaretty commented 9 years ago

Hi Richard,

We actually did come up with a patch, but didn't release it as it was unclear whether it actually fixes the problem as we do not have access to any data that can be used to reproduce it. Would it be possible for you to provide us with a sample of your data?

Best,

Lasse

Richard-Watkins commented 9 years ago

Hi Lasse,

I sent a copy of the data to the email address you posted above last week but have not heard back- can I check you received my email?

Best, Richard

lassemaretty commented 9 years ago

Hi Richard,

Your email got cought by our somewhat conservative spam filter:-( Ill have a look at it and get back to you.

Best, Lasse