deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
403 stars 118 forks source link

rsem-parse-alignments and rsem-build-read-index #211

Open project-FSL opened 3 months ago

project-FSL commented 3 months ago

Hello, I generated transcriptome.bam files using STAR aligner (not from rsem package) and I now want to use rsem to calculate gene expression. Everything has been running smoothly, except it just runs really really slow. It takes about 60 minutes to generate expression for one sample - Question 1 - Is this normal?

FYI, I had already generated rsem reference using rsem-prepare-reference Input: gencode.v45.primary_assembly.annotation.gtf, GRCh38.primary_assembly.genome.fa Output: .chrlist, .grp, .idx.fa, .n2g.idx.fa, .seq, .ti, .transcripts.fa

I noticed from the terminal, while running rsem-calculate-expression, there are three steps involved: 1) rsem-parse-alignments (which took 25 mins to run) 2) rsem-build-read-index (which took 5 mins to run) 3) rsem-run-em (which took 30 mins to run)

Therefore, total runtime of rsem-calculate-expression took 60 mins to run for one sample.

Question 2 - Does rsem need to parse alignments every time for each sample? Question 3 - Does rsem need to build read index every time for each sample? Question 4 - What can I do to save time? (My CPU has 8 cores, and I'm running -p 7 sometimes -p 8)

I'm new to this, please help. Any comment/suggestions are appreciated, thank you!

Fee