deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
421 stars 119 forks source link

two mates are aligned to two different transcripts! #136

Open snijesh opened 4 years ago

snijesh commented 4 years ago

I used following commands to prepare the reference genome

rsem-prepare-reference --gtf ref/Homo_sapiens.GRCh37.69.gtf \
                    --bowtie2 --bowtie2-path /home/ngs/bin/bowtie2 \
                    ref/Homo_sapiens.GRCh37.69.dna.primary_assembly.fa ref/homo

Next, I followed the step to calculate the expression from paired end data

rsem-calculate-expression -p 8 --paired-end \
                    --bowtie2 --bowtie2-path /home/ngs/bin/bowtie2 \
                    --estimate-rspd --append-names -output-genome-bam \
                    fastq/sampleX_1.fastq fastq/sampleX_2.fastq ref/homo sampleX

But, I got an error as follows:

A00711:123:HMJNFDMXX:2:1101:15203:1078's two mates are aligned to two different transcripts!
"rsem-tbam2gbam ref/homo sampleX.transcript.bam sampleX.genome.bam" failed! Plase check if you provide correct parameters/options for the pipeline!

I have seen similar error reported by other users too!!. But an appropriate solution is not given to overcome this problem

JieLiBio commented 4 years ago

i have the same error, do you have the solution now?

snijesh commented 4 years ago

i have the same error, do you have the solution now?

Hi, I downgraded the rsem from latest version to v1.3.0, and run the code without making any changes in the input files. It ran successfully. You can try with [v1.3.2] or [v1.3.1]

ekageyama commented 2 years ago

I ran into the same problem, in my case, the error produced when using an older version of star, and using 2.7.1 fixed the problem.

my0916 commented 1 year ago

Hi, I'm also suffering from the same problem. RSEM v1.3.1 STAR 2.7.10b Is there any solution other than downgrading?

clhatton commented 10 months ago

Hope this helps because it is the only alternative to downgrading solution I have found. Another grad student (Gregg Andrews) came up with a work-around for this that works if you are using STAR. For the record, I am using RSEM v1.3.1 and STAR 2.7.11a.

You can remove the --output and --sort commands and instead use --star-output-genome-bam in your RSEM block. This doesn't sort it for you though, so you'll have to use samtools sort to sort it after. I'll put my block of code below for reference- and I used rsem-prepare-reference prior to building my pipeline.

rsem-calculate-expression \
--paired-end \
--star \
--star-output-genome-bam \
-p 16 \
$sample.R1.P.fastq \
$sample.R2.P.fastq \
$tempDir/STAR/RSEM \
$sample

samtools sort -@ 16 $sample.STAR.genome.bam -o $sample.STAR.genome.sorted.bam
samtools sort -@ 16 $sample.transcript.bam -o $sample.transcript.sorted.bam
albustruong commented 6 months ago

@clhatton Hi, I'm also having the two mates trouble and have been trying to use STAR to go around it as your suggested. But it seems like STAR cannot recognize the index reference by rsem and always asks for genomeParameters.txt which I don't know where it comes from.

My script:

!/bin/bash

rsem-calculate-expression \
    --star --paired-end \
    --p 6 --star-output-genome-bam \
sub1-Ca13mCh-LGC9389_L1_1_cleaned.fq \
    sub1-Ca13mCh-LGC9389_L1_2_cleaned.fq \
    /mnt/d/NGS/rna-seq/03_ref/human \
    sub1-Ca13mCh-LGC9389_L1_PE_quals \
    2>&1 | tee sub1-Ca13mCh-LGC9389_L1_log.txt

Part of the log report:

EXITING because of FATAL ERROR: could not open genome file /mnt/d/NGS/rna-seq/03_ref/genomeParameters.txt SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions

Please see the attachment for the full log. sub1-Ca13mCh-LGC9389_L1_log.txt