deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
403 stars 118 forks source link

Problem implementing a reverse-strand specific analysis #204

Open sknaack opened 10 months ago

sknaack commented 10 months ago

Hi All,

I'm applying RSEM to a (reverse-strand-specific) RibTag-seq data set in mouse for the mouse 39 assembly. Such data is successfully treated as RNA-seq data in a reverse strand orientation. I've successfully aligned this data with STAR v2.7.10, and even obtained forward and reverse strand counts from this. When I apply RSEM 1.3.0 to the same data for the same 39 assembly (calling the same version of STAR for the default - internally organized - alignment per rsem-calculate-expression) with the --forward-prob 0 and/or --strandedness reverse options, the expected counts are in line with the number of + strand counts , rather than the - strand counts. A few example genes present several 10k alignments in - orientation, but a few 100 in + strand orientation. These same genes present expected counts that are closer to the numbers of + strand alignments by several factors.

In an earlier analysis of these data for the 38 assembly the --forward-prob 0 option successfully produced expected counts similar to the raw counts of reverse strand alignments. Basically the --forward-prob 0 and --strandedness reverse options are somehow insufficient to specify reverse strand alignments.

Does anyone have experience that speaks to this? with newer versions of STAR and RSEM 1.3.0? Any tips to correct this? Should I try bowtie2 in case it's a particularity of using STAR that is causing this? Thanks much in advance fro your assistance.

Best,

Sara

cndewey commented 10 months ago

Hi Sara,

I'm not aware of any issues with these options and I use them regularly in my own analyses. Specifying "--forward-prob 0" should be sufficient for your use case. To help debug the situation, I would suggest running STAR externally (as you have done) with the "--quantMode TranscriptomeSAM" option, and then feeding the resulting transcriptome BAM to RSEM as input. This should ensure that the RSEM output is as consistent as possible with the alignments that you are getting from running STAR separately. Another aspect to look at is how you are obtaining "raw" forward and reverse strand counts from your STAR alignments. It can be easy to mistakenly flip the strands in such analyses. Hopefully, one of those two strategies will resolve your issue.

Best, Colin

sknaack commented 10 months ago

Thanks Colin! In fact I traced it down to this previously indicated matter: https://stackoverflow.com/questions/74971057/unable-to-run-rsem-tool-because-of-error-rsem-parse-alignments-no-such-file-or In short, it was a technical error in the indexing as it was running on my machine.

This was a bit of a OS hitch, but perhaps something useful to note. Once I recompiled RSEM with the (extremely few) line changes to buildReadIndex.cpp, PairedEndHit.h and SingleHit.h that were recommended my results made sense. In the end I technically used the --strandedness reverse argument, but the RSEM expected counts results are now plenty comparable to the raw (reverse strand) counts from STAR, and identical to results generated in a different OS. Thanks again!

Best,

Sara