deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
404 stars 117 forks source link

rsem-plot-transcript-wiggles failing #107

Open sdwien opened 5 years ago

sdwien commented 5 years ago

Dear colleagues, despite trying different formats, inputs and combinations of options, rsem-plot-transcript-wiggles keeps failing with this message: Transcript/Allele IDs in the expression file is not exactly the same as the ones in the readdepth file! I have tried with rsem versions 1.2.28 and 1.3.1, it is the same for both of them. In the best case, the execution completes the following steps: `samtools sort -@ 1 -m 1G -o sample.transcript.sorted.bam sample.transcript.bam

[bam_sort_core] merging from 30 files...

rsem-bam2readdepth sample.transcript.sorted.bam sample.transcript.readdepth

rsem-get-unique 1 sample.transcript.bam sample.uniq.transcript.bam

.................................................................................................

done!

samtools sort -@ 1 -m 1G -o sample.uniq.transcript.sorted.bam sample.uniq.transcript.bam

[bam_sort_core] merging from 3 files...

rsem-bam2readdepth sample.uniq.transcript.sorted.bam sample.uniq.transcript.readdepth

rsem-gen-transcript-plots sample gene_list.txt 0 2 1 DE_transcripts_plots.pdf

Loading read depth files is done!

Transcript/Allele IDs in the expression file is not exactly the same as the ones in the readdepth file!

"rsem-gen-transcript-plots sample gene_list.txt 0 2 1 DE_transcripts_plots.pdf" failed! Plase check if you provide correct parameters/options for the pipeline!`

My gene_list.txt file looks like this: ENSG00000111642_CHD4 I have also tried with a list of transcripts that looks like this: ENST00000544040.6_CHD4-206 ENST00000645022.1_CHD4-231 ENST00000646462.1_CHD4-244 or like this: ENST00000544040.6_CHD4 ENST00000645022.1_CHD4 ENST00000646462.1_CHD4 or like this (which is how sequence names look like in the transcript.bam file): ENST00000544040.6 ENST00000645022.1 ENST00000646462.1 From the error, I presume that transcript and gene names should match between the sequences in the transcript.bam ~ readdepth files and the expression.results files?

Thanks in advance for any clarification and suggestions to get this to run, and thank you for this very complete and useful tool!

PS: In my sample.genes.results file, the first column looks like: ENSG00000000003.14_TSPAN6 and the second column looks like: ENST00000373020.8_TSPAN6-201,ENST00000494424.1_TSPAN6-202,ENST00000496771.5_TSPAN6-203,ENST00000612152.4_TSPAN6-204,ENST00000614008.4_TSPAN6-205 I used a gtf file for Gencode (human) as source of the annotations.

sdwien commented 5 years ago

Some additional information: The reason it may not be working is that I am using the rsem-generate-reference command with STAR aligner, and STAR aligner does not use a transcripts.fasta file as an input, but the genome and the gtf file with annotations.