deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
408 stars 118 forks source link

Does it need to create the fastq-index files? #24

Closed Hecate08 closed 8 years ago

Hecate08 commented 8 years ago

Hello,

we try to analyze the expression and read depth of TCGA-Data from ovarian cancer. The alignment was already made by STAR. So we just want to calculate the expression with rsem. Below is the command we used. But the program starts to build an index fastq file ("*_alignable.fastq"). This takes very long. For some of the samples more than 5 days. Is this step necessary even if we don't want to have .bam files in the end?

rsem-calculate-expression -p 5 --paired-end \ --alignments \ --estimate-rspd \ --append-names \ --no-bam-output \ --calc-ci \ --ci-memory 30000 \ --seed 5 \ $input_file.Aligned.toTranscriptome.out.bam \ $ref_file $output_prefix

Thanks

bli25wisc commented 8 years ago

Hi @Hecate08 ,

I have to admit that there is no need to extract reads from BAM file. Unfortunately, this is how RSEM is programmed. At the first 10 EM iterations, RSEM need to load these files for estimating sequencing error parameters.

However, I am surprised that the indexing step took that long time. In theory, this step should be as fast as parsing and writing the BAM file. If your BAM file is not as large as several TB, the only reason I can guess is if the server ran out of disk space. Can you check this possibility?

Thanks, Bo

Hecate08 commented 8 years ago

Hello,

The server has 20 threads and about 60GB memory. For this job I gave rsem 5 threads and in sum 12GB memory. So this should be ok. I tried rsem now with a smaller sample and with the rsem star alignment. Below are the temp files. It took about 5 hours to build the fastq-files for a bam-file of 3GB. The whole STAR-alignment step tooks 2 hours. What can be the problem that it takes so long?

-rw-rw-r-- 1 lneums jclab 3001763491 Apr 27 09:23 JZ-3.FCC8R8HACXX_L6_IGCCAAT.bam -rw-rw-r-- 1 lneums jclab 7090 Apr 27 09:23 JZ-3.FCC8R8HACXX_L6_IGCCAATLog.progress.out -rw-rw-r-- 1 lneums jclab 52225 Apr 27 09:23 JZ-3.FCC8R8HACXX_L6_IGCCAATLog.out -rw------- 1 jchien jclab 7107497 Apr 27 09:24 JZ-3.FCC8R8HACXX_L6_IGCCAATSJ.out.tab -rw------- 1 jchien jclab 1867 Apr 27 09:24 JZ-3.FCC8R8HACXX_L6_IGCCAATLog.final.out -rw------- 1 jchien jclab 0 Apr 27 09:24 JZ-3.FCC8R8HACXX_L6_IGCCAAT.omit -rw------- 1 jchien jclab 457810116 Apr 27 14:36 JZ-3.FCC8R8HACXX_L6_IGCCAAT_un_2.fq -rw------- 1 jchien jclab 457810116 Apr 27 14:36 JZ-3.FCC8R8HACXX_L6_IGCCAAT_un_1.fq -rw------- 1 jchien jclab 277061770 Apr 27 14:36 JZ-3.FCC8R8HACXX_L6_IGCCAAT.dat -rw------- 1 jchien jclab 2997789796 Apr 27 14:36 JZ-3.FCC8R8HACXX_L6_IGCCAAT_alignable_2.fq -rw------- 1 jchien jclab 2997789796 Apr 27 14:36 JZ-3.FCC8R8HACXX_L6_IGCCAAT_alignable_1.fq -rw------- 1 jchien jclab 6032032 Apr 27 14:39 JZ-3.FCC8R8HACXX_L6_IGCCAAT_alignable_1.fq.ridx -rw------- 1 jchien jclab 6032032 Apr 27 14:41 JZ-3.FCC8R8HACXX_L6_IGCCAAT_alignable_2.fq.ridx -rw------- 1 jchien jclab 31 Apr 27 14:41 JZ-3.FCC8R8HACXX_L6_IGCCAAT.mparams -rw------- 1 jchien jclab 739293536 Apr 27 15:53 JZ-3.FCC8R8HACXX_L6_IGCCAAT.ofg -rw------- 1 jchien jclab 3704242 Apr 27 15:53 JZ-3.FCC8R8HACXX_L6_IGCCAAT.iso_res -rw------- 1 jchien jclab 3169194 Apr 27 15:53 JZ-3.FCC8R8HACXX_L6_IGCCAAT.gene_res

Thanks Hecate08

bli25wisc commented 8 years ago

How much time does it take to finish RSEM once you have extracted read files?

bli25wisc commented 8 years ago

@Hecate08 That's so weird. Is it possible that in your server, it is very slow to write things to the disk?

P.S., for the space, I refer to disk space instead of memory.

Hecate08 commented 8 years ago

Sorry, there are 6 TB of diskspace on this server. The speed of writing to the disk may be the problem. Is there a way to specify a local temp folder instead of writing to the file distribution? Thanks

bli25wisc commented 8 years ago

@Hecate08, yes, see option --temporary-folder