deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
411 stars 118 forks source link

rsem-calculate-expression error SAM/BAM file declares more reference sequences than Rsem #105

Open VilainLab opened 5 years ago

VilainLab commented 5 years ago

Hi,

I am facing an error with rsem-calculate-expression, while trying to process fastq files with STAR alignment option from RSEM. The alignment occurs perfectly, but when the rsem-parse-alignments command starts it throws an error that the SAM/BAM file declares more reference sequence than RSEM knows. Please find below the command and the output: Input command rsem-calculate-expression --star --star-path /home/sbhattach2/STAR-2.6.0a/bin/ --star-gzipped-read-file -p 8 --paired-end --strandedness reverse /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R1_001.fastq.gz /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R2_001.fastq.gz /data/Suro/Fasta/Rsem_Human_Ref1 /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood

Output /home/sbhattach2/STAR-2.6.0a/bin//STAR --genomeDir /data/Suro/Fasta --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 20 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --runThreadN 8 --genomeLoad NoSharedMemory --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --outSAMheaderHD \@HD VN:1.4 SO:unsorted --outFileNamePrefix /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood --readFilesCommand zcat --readFilesIn /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R1_001.fastq.gz /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R2_001.fastq.gz Nov 14 11:53:41 ..... started STAR run Nov 14 11:53:41 ..... loading genome Nov 14 11:55:02 ..... started mapping Nov 14 12:10:05 ..... finished successfully

rsem-parse-alignments /data/Suro/Fasta/Rsem_Human_Ref1 /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.stat/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood.bam 3 -tag XM The SAM/BAM file declares more reference sequences (203798) than RSEM knows (196483)! "rsem-parse-alignments /data/Suro/Fasta/Rsem_Human_Ref1 /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.stat/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood.bam 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!

I have earlier used STAR 2.5.3a for the alignment using the same command and I got the TPM counts perfectly. However, when I re-ran the process using the same scripts with STAR 2.5.3a and rsem 1.3.0, I faced the issue. Now even after reinstalling STAR and rsem and also creating the reference again I get the same issue. The fasta file used is Homo_sapiens.GRCh37.dna.primary_assembly.fa and gtf gencode.v19.annotation_mod.gtf.

Please let me know, if you need any other information.

Thanks again for all the help in advance.

Surajit