deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
404 stars 117 forks source link

Fragment XXX is hung over the end of transcript YYY! #115

Closed sklarz-bgu closed 5 years ago

sklarz-bgu commented 5 years ago

Dear RSEM developers

I've been using RSEM for a while but never seen anything like this. I've got a non-model genome with a gtf annotation. After building a STAR reference for the genome with the gtf file, I map some reads to the genome, followed by quantification with RSEM. I'm following the regular pipeline which has succeeded for me before, even with this genome.

However, I'm trying the same pipeline for a new set of reads, and I get several reads with the following comment:

Fragment HWI-ST132_0470:2:1101:1007:48766#GCGGGC is hung over the end of transcript 33563! It is possible that the aligner you use gave different read lengths for a same read in SAM file.

I've traced this down to line 115 in PairedEndQModel.h. These are the values calucated for the variables therein: fpos=408; insertLen=210; totLen=582.

The read fails because fpos+insertLen>totLen. However, in the header of the 'toTranscriptome' BAM file, the length is given as LN:1054, according to which the read does not fail the condition.

How does RSEM calculate the transcript length? What could be the reason for this failure? What am I doing wrong?

Thank you very much! Menachem

Attached are the offending sections of the BAM files and GTF file.

offend.genome.sam.txt offend.transcriptome.sam.txt offending.gtf.txt

minnieanne commented 4 years ago

Dear sklarz-bgu

How did you solve this problem? I'm struggling with the same problem now.

nicozuniga commented 4 years ago

Dear sklarz-bgu and minnieanne

I'm having the same problem and I can't find any solution at all

jjtch commented 4 years ago

Dear all,

I am facing the exact same problem and can't find any solution at all too :(

davetang commented 3 years ago

In my case, the solution was to use a different GTF file. Specifically, I had problems with the GENCODE primary assembly GTF file (as recommended in the STAR manual) but switching to the GTF file for reference chromosomes only, mitigated the error.

uros-sipetic commented 3 years ago

The problem here seems to be in the GTF, if there are overlapping exons within the same transcript. Most likely this happens if you're using a non-model GTF. These exons cause STAR to report improper transcript lenghts, and altough the alignment step finishes, RSEM can't handle the resulting BAM like that. More info here: https://github.com/alexdobin/STAR/issues/1128