Gaius-Augustus / Augustus

Genome annotation with AUGUSTUS
http://bioinf.uni-greifswald.de/webaugustus/
281 stars 108 forks source link

GFF3 format of AUGUSTUS #74

Open KatharinaHoff opened 5 years ago

KatharinaHoff commented 5 years ago

Native AUGUSTUS GFF3 format is lacking correct unique ID= field for CDS features (they should have an incremented number for each gene), most other features except gene, transcript and CDS completely lack the unique ID= field in column 9.

This causes problems for some users with the https://github.com/NBISweden/EMBLmyGFF3 parser.

KatharinaHoff commented 5 years ago

I have fixed the gff3 output format of the script gtf2gff.pl (https://github.com/Gaius-Augustus/Augustus/commit/842a069ad0d065e08c182eb5f86724d7232601e6), but not for native AUGUSTUS. Thus, for now, we can recommend users not to run AUGUSTUS with --gff3 option, but to produce gtf format with AUGUSTUS and subsequently convert to gff3 with:

gtf2gff.pl < input.gtf --out=out.gff3 --gff3

Issue remains open because I think this should also be fixed on AUGUSTUS C++ code level.

KatharinaHoff commented 4 years ago

The solution here is to not run augustus with --gff3. Instead, take the output of a run without --gff3 flag and transform it with gtf2gff.pl

On Tue, Oct 29, 2019 at 3:44 AM longzhangnation notifications@github.com wrote:

so if i run AUGUSTUS with --gff3 option and get the final result, how can I transform it into real gff3 form ? I try $EVM_BASE_DIR/EvmUtils/ gff3_gene_prediction_file_validator.pl to test the final result, but it tells me 'Fatal Error: cannot parse ID from entry' . emm, can I use the result for my gene prediction with EVidencemodeler? here is my augustus -gff3 output Predicted genes for sequence number 1 on both strands start gene g1

ctg1 AUGUSTUS gene 1 25691 0.03 + . ID=g1 ctg1 AUGUSTUS transcript 1 25691 0.03 + . ID=g1.t1;Parent=g1 ctg1 AUGUSTUS intron 1 2410 0.49 + . Parent=g1.t1 ctg1 AUGUSTUS intron 2572 3490 0.73 + . Parent=g1.t1 ctg1 AUGUSTUS intron 3701 9701 0.5 + . Parent=g1.t1 ctg1 AUGUSTUS intron 9817 12439 0.55 + . Parent=g1.t1 ctg1 AUGUSTUS intron 12538 21642 0.61 + . Parent=g1.t1 ctg1 AUGUSTUS intron 21853 25432 0.65 + . Parent=g1.t1 ctg1 AUGUSTUS CDS 2411 2571 0.46 + 2 ID=g1.t1.cds;Parent=g1.t1 ctg1 AUGUSTUS exon 2411 2571 . + . Parent=g1.t1 ctg1 AUGUSTUS CDS 3491 3700 0.57 + 0 ID=g1.t1.cds;Parent=g1.t1 ctg1 AUGUSTUS exon 3491 3700 . + . Parent=g1.t1

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/Augustus/issues/74?email_source=notifications&email_token=AJMC6JAN3HDXQHJGTDBMTT3QQ6PRHA5CNFSM4INSJMG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECPBOZY#issuecomment-547231591, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JC7YBUV2JPOXZTJ6XLQQ6PRHANCNFSM4INSJMGQ .

longzhangnation commented 4 years ago

ok,thanks for you reply. I have rerun my script without parameter --gff3.

Juke34 commented 4 years ago

Using agat_sp_gxf_to_gff3.pl from AGAT will fix the Wong GFF3 output into a correct GFF3. It can also be used to convert the GTF into GFF.

The problem is lying around into AUGUSTUS since a while, it should definitely be fixed.