deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
403 stars 118 forks source link

warning!!!!aligner you use gave different read lengths for a same read in SAM file #146

Open shelfey opened 3 years ago

shelfey commented 3 years ago

when i use RSEM to calculate the expression, most of the samples are well done but some samples got the warnings like :

Fragment A00262:496:HFVGVDSXY:3:1168:16550:9048 is hung over the end of transcript X! It is possible that the aligner you use gave different read lengths for a same read in SAM file. Found unknown sequence letter !

I did a lot of transcriptome analysis using RSEM, but it is the first time I come across the problem. So how to deal with it ? what happened in process of bam out ?........

uros-sipetic commented 3 years ago

Hey @shelfey did you figure this one out? I'm in a similar boat - did a lot of analysis with RSEM, but this is coming up just recently for the first time, though on sample non-standard model organism (mixed human with HPV for example). It could be that the added HPV GTF has some "issues", but again this only happens for some sample, and I'd like to find out why

uros-sipetic commented 3 years ago

The problem here seems to be in the GTF, if there are overlapping exons within the same transcript. Most likely this happens if you're using a non-model GTF. These exons cause STAR to report improper transcript lenghts, and altough the alignment step finishes, RSEM can't handle the resulting BAM like that. More info here: https://github.com/alexdobin/STAR/issues/1128