DaehwanKimLab / tophat

Spliced read mapper for RNA-Seq
http://ccb.jhu.edu/software/tophat
Boost Software License 1.0
91 stars 46 forks source link

tophat with -p > 1 results in missing reads #18

Closed jmack159 closed 9 years ago

jmack159 commented 9 years ago

I am running tophat (version 2.0.13, from the biolinux distribution). I am facing a problem where running tophat with increasing values for -p (number of threads) results in more and more reads lost in the final output. I'm starting with an uncompressed fastq file containing
18,115,321 reads, and running tophat with default parameters except for -p and -o.

Running -p 8 results with the following information in align_summary.txt: Input : 318640 Mapped: 191949 (60.2% of input) of these: 29316 (15.3%) have multiple alignments (0 have >20) 60.2% overall read mapping rate.

Running -p 4 results with the following information in align_summary.txt: Input : 1302700 Mapped: 759998 (58.3% of input) of these: 115861 (15.2%) have multiple alignments (1 have >20) 58.3% overall read mapping rate.

Running -p 1 results with the following information in align_summary.txt: Input : 18115321 Mapped: 12014534 (66.3% of input) of these: 1867188 (15.5%) have multiple alignments (13 have >20) 66.3% overall read mapping rate.

I also tried running tophat with the --no-sort-bam option to check if samtools was somehow screwing up during the mergesort operation, but I get the same result. I also confirmed the numbers reported in the align_summary.txt file using the samtools flagstat command. Further using bowtie1 instead of bowtie2 for the alignment engine did not resolve the problem of these reads going missing.

This issue seems to have been reported in the following forum posts: http://seqanswers.com/forums/showthread.php?t=33633 https://www.biostars.org/p/93110/

tbooth commented 9 years ago

Tophat in Bio-Linux 8 is now at the latest version 2.1.0 - see https://launchpad.net/~nebc/+archive/ubuntu/bio-linux/+packages?field.name_filter=tophat Also Bowtie/Bowtie2 are the latest versions.

Is this issue still evident after updating? TIM

mbiokyle29 commented 9 years ago

+1 Here is my biostar issue, I am currently re-running my samples with the updated version of tophat to see if the issue is fixed.

infphilo commented 9 years ago

@jmack159 Thanks for letting us know this problem - we've been trying to reproduce this problem, but no success so far. Would you like to let us know how your recent test (using the latest version, TopHat v2.1.0) goes? Also, I need to know which version of samtools was installed on the machine when you ran TopHat 2.0.13?

Thanks, Daehwan

mbiokyle29 commented 9 years ago

I updated my version of tophat to v2.1.0 and no longer encountered the error on these samples. I also looked back at my previous samples (run on v2.0.13) and it seems that this is the first time this has occurred, and I have always used -p > 1. I will also continue to try and reproduce

Here is my samtools version:

$ samtools

Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.19-96b5f2294a

I am on debian jessie

$ uname -a
Linux alpha-helix 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) x86_64 GNU/Linux

Thanks!

gpertea commented 9 years ago

Whoa, if this was 2.0.13 built from source in the presence of a different version of libbam.a elsewhere on the build system (e.g. 0.1.19 instead of the tophat packaged 0.1.18), it looks very much like the subtle libbam linking bug which appeared due to samtools' headers vs lib version mismatch (wrong libbam.a being linked) which we had fixed in v2.0.14 (last entry in the release notes there mentioned it, albeit not getting specific about the devastating consequences of bam indexes being trashed etc.).

So yeah, don't run 2.0.13.. In general, reporting bugs found in an older version.. which cannot be reproduced in the current version... makes things easier for us, in a way, I guess ;)

jmack159 commented 9 years ago

I upgraded tophat, bowtie1 and bowtie2 to the lasted versions (2.1.0; 1.1.1; 2.2.6; respectively) using the packages @tbooth prepared for the biolinux distro. Running the same test as in my original post gives consistent and correct results across all values for -p (same as -p1 in my original post).

So it looks like this issue is fixed using the latest versions. Extra thanks to @tbooth for updating the packages!

@infphilo - the best person to ask would be @tbooth for the samtools versions at the time of the bug (I don't recall seeing it in the list of updates). I typically keep my packages pretty up to date, and mostly use apt-get for installing tools. However, it looks like I currently have samtools v0.1.19-96b5f2294a installed.

infphilo commented 9 years ago

As @gpertea commented - only TopHat 2.0.13 has the issue.

Thanks, Daehwan

granek commented 8 years ago

This problem still persists in TopHat v2.1.0 (installed from jessie-backports). I posted details on biostars: https://www.biostars.org/p/93110/#206511

You can replicate it on this docker image (v2): https://hub.docker.com/r/janicemccarthy/dukehtscourse/