Closed hackerzone85 closed 8 years ago
Hi Mahendra Gaur,
Thanks for reporting this bug to us. We have already fixed it. Please update your RSEM to v1.2.27 and everything should work smoothly.
Best, Bo
On 2016-01-29 06:41, Mahendra Gaur wrote:
I have encountered a bug while preparing reference sequence of Arabidopsis_thaliana using RSEM (latest developer version) The fasta and gtf files were downloaded from ftp://ftpensemblgenomesorg/pub/release-30/plants/ Here is the cmd output:
blab@blab: rsem-prepare-reference -gtf filteredgtf --bowtie2 --bowtie-path /usr/bin/bowtie2 Arabidopsis_thalianaTAIR1030dnatoplevelfa attair1030 Warning: If Bowtie is not used, no need to set --bowtie-path option! rsem-extract-reference-transcripts attair1030 0 filteredgtf None 0 Arabidopsis_thalianaTAIR1030dnatoplevelfa The GTF file might be corrupted! Stop at line : 1 tair exon 4810488 4811109 + gene_id "AT1G14040"; gene_version "1"; transcript_id "AT1G140401"; transcript_version "1"; exon_number "1"; gene_name "PHO1;H3"; gene_source "tair"; gene_biotype "protein_coding"; transcript_name "PHO1;H3"; transcript_source "tair"; transcript_biotype "protein_coding"; exon_id "AT1G140401exon1"; exon_version "1"; Error Message: Cannot separate the identifier from the value for attribute H3"! "rsem-extract-reference-transcripts attair1030 0 filteredgtf None 0 Arabidopsis_thalianaTAIR1030dnatoplevelfa" failed! Plase check if you provide correct parameters/options for the pipeline!
What is correct format of gtf file?
Reply to this email directly or view it on GitHub [1].
*
Links:
Hi Bo, This could be unrelated but I have been receiving the same error with rsem-prepare-reference despite updating to v1.1.27. I have verified the error with the GTF files linked on the RSEM website.
/idi/sabeti-data/rtewhey/bin/RSEM-1.2.27/rsem-prepare-reference --gtf gencode.v19.annotation.gtf -p 8 GRCh37.p13.genome.fa GRCh37.gencode_v19.RSEM rsem-extract-reference-transcripts GRCh37.gencode_v19.RSEM 0 gencode.v19.annotation.gtf None 0 GRCh37.p13.genome.fa The GTF file might be corrupted! Stop at line : chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1"; Error Message: Cannot locate the identifier from attribute gene_id "ENSG00000223972.4";! "rsem-extract-reference-transcripts GRCh37.gencode_v19.RSEM 0 gencode.v19.annotation.gtf None 0 GRCh37.p13.genome.fa" failed! Plase check if you provide correct parameters/options for the pipeline!
btw, I can confirm dropping down to 1.1.25 resolves the issue.
Hi rtewhey,
Sorry for the bugs. I just fixed them and released a new version.
Best, Bo
On 2016-02-02 06:50, rtewhey wrote:
Hi Bo, This could be unrelated but I have been receiving the same error with rsem-prepare-reference despite updating to v1.1.27. I have verified the error with the GTF files linked on the RSEM website.
/idi/sabeti-data/rtewhey/bin/RSEM-1.2.27/rsem-prepare-reference --gtf gencode.v19.annotation.gtf -p 8 GRCh37.p13.genome.fa GRCh37.gencode_v19.RSEM rsem-extract-reference-transcripts GRCh37.gencode_v19.RSEM 0 gencode.v19.annotation.gtf None 0 GRCh37.p13.genome.fa The GTF file might be corrupted! Stop at line : chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1"; Error Message: Cannot locate the identifier from attribute gene_id "ENSG00000223972.4";! "rsem-extract-reference-transcripts GRCh37.gencode_v19.RSEM 0 gencode.v19.annotation.gtf None 0 GRCh37.p13.genome.fa" failed! Plase check if you provide correct parameters/options for the pipeline!
Reply to this email directly or view it on GitHub [1].
*
Links:
[1] https://github.com/deweylab/RSEM/issues/17#issuecomment-178610788
Hi Bo Li Thanks for fixing bug. Its works now, my reference is prepared now. Very Very thanks Bo.
Hi, It appears this, ftp://ftp.ensembl.org/pub/release-98/gtf/drosophila_melanogaster/ file in Ensemble is corrupted could help uploading a fixed version for me? Thank you.
Bright
I have encountered a bug while preparing reference sequence of Arabidopsis_thaliana using RSEM (latest developer version). The fasta and gtf files were downloaded from ftp://ftp.ensemblgenomes.org/pub/release-30/plants/. Here is the cmd output:
What is correct format of gtf file?