genomeannotation / GAG

Generates an NCBI .tbl file of annotations on a genome.
MIT License
64 stars 20 forks source link

GFF3 incompatibilities #195

Open mictadlo opened 5 years ago

mictadlo commented 5 years ago

Hi, I have experienced GFF3 incompatibilities from some plant genomes:

Source for files ftp://ftp.solgenomics.net/genomes/Nicotiana_tomentosiformis/
python2 gag.py --fasta Ntom_ASAG01.fa --gff Ntom_ASAG01-itag_2.4.80-80.gff3 --out gag
Done.
Calculating stats on original genome
Traceback (most recent call last):
  File "/work/waterhouse_team/apps/GAG/gag.py", line 50, in <module>
    main()
  File "/work/waterhouse_team/apps/GAG/gag.py", line 46, in main
    controller.execute(args)
  File "/lustre/work-lustre/waterhouse_team/apps/GAG/src/controller.py", line 80, in execute
    self.stats_mgr.update_ref(seq.stats())
  File "/lustre/work-lustre/waterhouse_team/apps/GAG/src/sequence.py", line 489, in stats
    stats["Shortest intron"] = int(self.get_shortest_intron())
  File "/lustre/work-lustre/waterhouse_team/apps/GAG/src/sequence.py", line 386, in get_shortest_intron
    length = gene.get_shortest_intron()
  File "/lustre/work-lustre/waterhouse_team/apps/GAG/src/gene.py", line 188, in get_shortest_intron
    length = mrna.get_shortest_intron()
  File "/lustre/work-lustre/waterhouse_team/apps/GAG/src/xrna.py", line 313, in get_shortest_intron
    raise Exception("Intron with negative length on " + self.identifier)
Exception: Intron with negative length on g55910.t1
Source for files ftp://ftp.solgenomics.net/genomes/Arabidopsis_thaliana/

python2 gag.py --fasta TAIR10_genome.fas --gff TAIR10_GFF3_genes.gff --out gag
python2 /work/waterhouse_team/apps/GAG/gag.py --fasta NIATTr2.scaffold.fa --gff NIATTr2.an5.gff --out gag
Source for files ftp://ftp.solgenomics.net/genomes/Nicotiana_tabacum/

python2 gag.py --fasta Nitab-v4.5_genome_Chr_Edwards2017.fasta --gff Nitab-v4.5_gene_models_Chr_Edwards2017.gff --out gag-chr
python2 gag.py --fasta Nitab-v4.5_genome_Scf_Edwards2017.fasta --gff Nitab-v4.5_gene_models_Scf_Edwards2017.gff --out gag-scf
Source for files ftp://ftp.solgenomics.net/genomes/Nicotiana_sylvestris/

python2 gag.py --fasta Nsyl_ASAF01.fa --gff Nsyl_ASAF01-itag_2.4.80-80.gff3 --out gag

What did I miss?

Thank you in advance.

Michal

RAJESHKMAURYA commented 4 years ago

I am getting the same error here with our file.

./gag.py --fasta pathto/abc.fasta --gff pathto/abc.gff3 -a pathto/abc.annotations --out annotated

Reading fasta... Done. Reading gff... Traceback (most recent call last): File "./gag.py", line 50, in main() File "./gag.py", line 46, in main controller.execute(args) File "/export/home/smrtanalysis/GenePrediction/Submission/genomeannotation-GAG-997e384/src/controller.py", line 74, in execute self.read_gff(gffpath, out_dir) File "/export/home/smrtanalysis/GenePrediction/Submission/genomeannotation-GAG-997e384/src/controller.py", line 286, in read_gff genes, comments, invalids, ignored = gffreader.read_file(reader) File "/export/home/smrtanalysis/GenePrediction/Submission/genomeannotation-GAG-997e384/src/gff_reader.py", line 336, in read_file if len(line) == 0 or line.startswith('#'): TypeError: startswith first arg must be bytes or a tuple of bytes, not str kindly let me know the solution.

ScientistJake commented 4 years ago

@RAJESHKMAURYA make sure you invoke in python2.7:

python2.7 gag.py  ....