rsem-parse-alignments and rsem-build-read-index

Hello, I generated transcriptome.bam files using STAR aligner (not from rsem package) and I now want to use rsem to calculate gene expression. Everything has been running smoothly, except it just runs really really slow. It takes about 60 minutes to generate expression for one sample - Question 1 - Is this normal?

FYI, I had already generated rsem reference using rsem-prepare-reference Input: gencode.v45.primary_assembly.annotation.gtf, GRCh38.primary_assembly.genome.fa Output: .chrlist, .grp, .idx.fa, .n2g.idx.fa, .seq, .ti, .transcripts.fa

I noticed from the terminal, while running rsem-calculate-expression, there are three steps involved: 1) rsem-parse-alignments (which took 25 mins to run) 2) rsem-build-read-index (which took 5 mins to run) 3) rsem-run-em (which took 30 mins to run)

Therefore, total runtime of rsem-calculate-expression took 60 mins to run for one sample.

Question 2 - Does rsem need to parse alignments every time for each sample? Question 3 - Does rsem need to build read index every time for each sample? Question 4 - What can I do to save time? (My CPU has 8 cores, and I'm running -p 7 sometimes -p 8)

I'm new to this, please help. Any comment/suggestions are appreciated, thank you!

Fee

deweylab / RSEM

rsem-parse-alignments and rsem-build-read-index #211