deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
411 stars 118 forks source link

Bowtie v1.2.0 outputs interleaving read pairs #73

Open brisk022 opened 7 years ago

brisk022 commented 7 years ago

When using RSEM v1.3.0 with Bowtie v1.2.0, it reported the following error:

Warning: Detected a read pair whose two mates have different names--SRR388248.26 and SRR388248.47!
Read SRR388248.26: The two mates do not align to a same transcript! RSEM does not support discordant alignments.

The temporary bam file has the reads in the following order:

SRR388248.26    99      ENSMUST00000140757      34      255     50M     =       185 ...
SRR388248.47    163     ENSMUST00000003268      1548    255     50M     =       1682 ...
SRR388248.26    147     ENSMUST00000140757      185     255     50M     =       34 ...
SRR388248.47    83      ENSMUST00000003268      1682    255     50M     =       1548 ...

The mates are not adjacent in the files produced by RSEM/Bowtie. Adding --sort-bam-by-read-name does not help. Below is the command.

rsem-calculate-expression  --calc-ci  --bowtie-e 200 -p 8 --ci-memory 26000  --paired-end R1.fastq R2.fastq transcripts mut1
brisk022 commented 7 years ago

As far as I can tell, the problem is with Bowtie >= v1.2.0. It seems to behave differently from <= v1.1.2. The same dataset does not generate any errors when processed with RSEM v1.3.0/Bowtie v1.1.2 but fails with the latest RSEM v1.3.0/Bowtie v1.2.1.1. At first, I thought it was the switch to TBB but I get an error whether Bowtie v1.2 was compiled with TBB or not.

Perhaps, you should add convert-sam-for-rsem step to rsem-calculate-expression by default or check for bowtie version and run it for bowtie >= v1.2.

DarioS commented 7 years ago

I also noticed this and reported it to the Bowtie maintainers BenLangmead/bowtie#52