deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
417 stars 118 forks source link

The GTF file might be corrupted! Stop at line : 1 #17

Closed hackerzone85 closed 8 years ago

hackerzone85 commented 8 years ago

I have encountered a bug while preparing reference sequence of Arabidopsis_thaliana using RSEM (latest developer version). The fasta and gtf files were downloaded from ftp://ftp.ensemblgenomes.org/pub/release-30/plants/. Here is the cmd output:

blab@blab: rsem-prepare-reference -gtf filtered.gtf --bowtie2 --bowtie-path /usr/bin/bowtie2 Arabidopsis_thaliana.TAIR10.30.dna.toplevel.fa at.tair10.30
Warning: If Bowtie is not used, no need to set --bowtie-path option!
rsem-extract-reference-transcripts at.tair10.30 0 filtered.gtf None 0 Arabidopsis_thaliana.TAIR10.30.dna.toplevel.fa
The GTF file might be corrupted!
Stop at line : 1    tair    exon    4810488 4811109 .   +   .   gene_id "AT1G14040"; gene_version "1"; transcript_id "AT1G14040.1"; transcript_version "1"; exon_number "1"; gene_name "PHO1;H3"; gene_source "tair"; gene_biotype "protein_coding"; transcript_name "PHO1;H3"; transcript_source "tair"; transcript_biotype "protein_coding"; exon_id "AT1G14040.1.exon1"; exon_version "1";
Error Message: Cannot separate the identifier from the value for attribute H3"!
"rsem-extract-reference-transcripts at.tair10.30 0 filtered.gtf None 0 Arabidopsis_thaliana.TAIR10.30.dna.toplevel.fa" failed! Plase check if you provide correct parameters/options for the pipeline!

What is correct format of gtf file?

bli25wisc commented 8 years ago

Hi Mahendra Gaur,

Thanks for reporting this bug to us. We have already fixed it. Please update your RSEM to v1.2.27 and everything should work smoothly.

Best, Bo

On 2016-01-29 06:41, Mahendra Gaur wrote:

I have encountered a bug while preparing reference sequence of Arabidopsis_thaliana using RSEM (latest developer version) The fasta and gtf files were downloaded from ftp://ftpensemblgenomesorg/pub/release-30/plants/ Here is the cmd output:

blab@blab: rsem-prepare-reference -gtf filteredgtf --bowtie2 --bowtie-path /usr/bin/bowtie2 Arabidopsis_thalianaTAIR1030dnatoplevelfa attair1030 Warning: If Bowtie is not used, no need to set --bowtie-path option! rsem-extract-reference-transcripts attair1030 0 filteredgtf None 0 Arabidopsis_thalianaTAIR1030dnatoplevelfa The GTF file might be corrupted! Stop at line : 1 tair exon 4810488 4811109 + gene_id "AT1G14040"; gene_version "1"; transcript_id "AT1G140401"; transcript_version "1"; exon_number "1"; gene_name "PHO1;H3"; gene_source "tair"; gene_biotype "protein_coding"; transcript_name "PHO1;H3"; transcript_source "tair"; transcript_biotype "protein_coding"; exon_id "AT1G140401exon1"; exon_version "1"; Error Message: Cannot separate the identifier from the value for attribute H3"! "rsem-extract-reference-transcripts attair1030 0 filteredgtf None 0 Arabidopsis_thalianaTAIR1030dnatoplevelfa" failed! Plase check if you provide correct parameters/options for the pipeline!

What is correct format of gtf file?

Reply to this email directly or view it on GitHub [1].

*

Links:

[1] https://github.com/deweylab/RSEM/issues/17

rtewhey commented 8 years ago

Hi Bo, This could be unrelated but I have been receiving the same error with rsem-prepare-reference despite updating to v1.1.27. I have verified the error with the GTF files linked on the RSEM website.

/idi/sabeti-data/rtewhey/bin/RSEM-1.2.27/rsem-prepare-reference --gtf gencode.v19.annotation.gtf -p 8 GRCh37.p13.genome.fa GRCh37.gencode_v19.RSEM
rsem-extract-reference-transcripts GRCh37.gencode_v19.RSEM 0 gencode.v19.annotation.gtf None 0 GRCh37.p13.genome.fa
The GTF file might be corrupted!
Stop at line : chr1 HAVANA  exon    11869   12227   .   +   .   gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1;  exon_id "ENSE00002234944.1";  level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
Error Message: Cannot locate the identifier from attribute gene_id "ENSG00000223972.4";!
"rsem-extract-reference-transcripts GRCh37.gencode_v19.RSEM 0 gencode.v19.annotation.gtf None 0 GRCh37.p13.genome.fa" failed! Plase check if you provide correct parameters/options for the pipeline!

btw, I can confirm dropping down to 1.1.25 resolves the issue.

bli25wisc commented 8 years ago

Hi rtewhey,

Sorry for the bugs. I just fixed them and released a new version.

Best, Bo

On 2016-02-02 06:50, rtewhey wrote:

Hi Bo, This could be unrelated but I have been receiving the same error with rsem-prepare-reference despite updating to v1.1.27. I have verified the error with the GTF files linked on the RSEM website.

/idi/sabeti-data/rtewhey/bin/RSEM-1.2.27/rsem-prepare-reference --gtf gencode.v19.annotation.gtf -p 8 GRCh37.p13.genome.fa GRCh37.gencode_v19.RSEM rsem-extract-reference-transcripts GRCh37.gencode_v19.RSEM 0 gencode.v19.annotation.gtf None 0 GRCh37.p13.genome.fa The GTF file might be corrupted! Stop at line : chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1"; Error Message: Cannot locate the identifier from attribute gene_id "ENSG00000223972.4";! "rsem-extract-reference-transcripts GRCh37.gencode_v19.RSEM 0 gencode.v19.annotation.gtf None 0 GRCh37.p13.genome.fa" failed! Plase check if you provide correct parameters/options for the pipeline!

Reply to this email directly or view it on GitHub [1].

*

Links:

[1] https://github.com/deweylab/RSEM/issues/17#issuecomment-178610788

hackerzone85 commented 8 years ago

Hi Bo Li Thanks for fixing bug. Its works now, my reference is prepared now. Very Very thanks Bo.

bamankwa commented 5 years ago

Hi, It appears this, ftp://ftp.ensembl.org/pub/release-98/gtf/drosophila_melanogaster/ file in Ensemble is corrupted could help uploading a fixed version for me? Thank you.

Bright