GMOD / Apollo

Genome annotation editor with a Java Server backend and a Javascript client that runs in a web browser as a JBrowse plugin.
http://genomearchitect.readthedocs.io/
Other
128 stars 85 forks source link

Load gff3 from braker2 (augustus hints) #2330

Closed agroppi closed 4 years ago

agroppi commented 4 years ago

Hi,

I have an instance of Apollo 2.2.0 running. I'm trying to load a GFF3 ouput from Braker2 v2.1.4 (augustus.hints.gff3) This file is sorted.

here is a sample from this file :

chr1    AUGUSTUS    gene    19904   36761   .   +   .   jg34414
chr1    AUGUSTUS    transcript  19904   36761   .   +   .   transcript_id "jg34414.t1"; gene_id "jg34414"
chr1    AUGUSTUS    start_codon 19904   19906   .   +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 19904   20044   0.99    +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 20148   20257   1   +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 20365   20437   1   +   1   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 21107   21168   1   +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 21847   21988   1   +   1   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 23272   23490   1   +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 23633   23699   1   +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 25537   25936   0.98    +   2   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 27296   27443   1   +   1   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 27567   27727   1   +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 28374   28473   1   +   1   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 29262   29336   1   +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 29474   29608   1   +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 32091   32180   1   +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 32435   32617   1   +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 32707   32859   0.22    +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 34007   34571   0.49    +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 35234   35338   0.69    +   2   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    CDS 35728   36761   0.76    +   2   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    stop_codon  36759   36761   .   +   0   transcript_id "jg34414.t1"; gene_id "jg34414";
chr1    AUGUSTUS    gene    40737   44003   .   +   .   jg34415
chr1    AUGUSTUS    CDS 40737   40937   0.3 +   0   transcript_id "jg34415.t1"; gene_id "jg34415";
chr1    AUGUSTUS    start_codon 40737   40739   .   +   0   transcript_id "jg34415.t1"; gene_id "jg34415";
chr1    AUGUSTUS    transcript  40737   44003   .   +   .   transcript_id "jg34415.t1"; gene_id "jg34415"
chr1    AUGUSTUS    CDS 41235   41279   1   +   0   transcript_id "jg34415.t1"; gene_id "jg34415";
chr1    AUGUSTUS    CDS 41421   41642   1   +   0   transcript_id "jg34415.t1"; gene_id "jg34415";
chr1    AUGUSTUS    CDS 43304   43410   1   +   0   transcript_id "jg34415.t1"; gene_id "jg34415";
chr1    AUGUSTUS    CDS 43498   43551   1   +   1   transcript_id "jg34415.t1"; gene_id "jg34415";
chr1    AUGUSTUS    CDS 43637   43709   1   +   1   transcript_id "jg34415.t1"; gene_id "jg34415";
chr1    AUGUSTUS    CDS 43842   44003   0.71    +   0   transcript_id "jg34415.t1"; gene_id "jg34415";
chr1    AUGUSTUS    stop_codon  44001   44003   .   +   0   transcript_id "jg34415.t1"; gene_id "jg34415";
chr1    AUGUSTUS    start_codon 47076   47078   .   +   0   transcript_id "jg34416.t1"; gene_id "jg34416";
chr1    AUGUSTUS    gene    47076   51668   .   +   .   jg34416
chr1    AUGUSTUS    CDS 47076   47153   1   +   0   transcript_id "jg34416.t1"; gene_id "jg34416";
chr1    AUGUSTUS    transcript  47076   51668   .   +   .   transcript_id "jg34416.t1"; gene_id "jg34416"
chr1    AUGUSTUS    CDS 47648   47899   1   +   0   transcript_id "jg34416.t1"; gene_id "jg34416";
chr1    AUGUSTUS    CDS 48378   48654   1   +   0   transcript_id "jg34416.t1"; gene_id "jg34416";
chr1    AUGUSTUS    CDS 48832   48986   1   +   2   transcript_id "jg34416.t1"; gene_id "jg34416";
chr1    AUGUSTUS    CDS 51531   51668   0.97    +   0   transcript_id "jg34416.t1"; gene_id "jg34416";
chr1    AUGUSTUS    stop_codon  51666   51668   .   +   0   transcript_id "jg34416.t1"; gene_id "jg34416";

My command line :

perl flatfile-to-json.pl \
--gff /home/ag/data/temp/augustus.hints.sorted.gff3 \
--type gene \
--trackLabel Prediction_Gene \
--out /home/ag/data/my_genome

The loading works but the exons / introns are not displayed

Here is a screen capture : Annotation

The upper part is the results from transdecoder on RNA assembly The lower is from the GFF3 from braker2 augustus.hints.gff3

Should I upgrade to Apollo 2.4.1. Or is there a way to modify my GFF3 (even for 2.2.0 or 2.4. 1) in order to have the good gene structure displayed ?

Thanks

nathandunn commented 4 years ago

This is more of a JBrowse than an Apollo question. You can search further here (https://sourceforge.net/p/gmod/mailman/search/?q=flat-file-to-json.pl) or ask (preferably on gitter) https://jbrowse.org/en/contact.html. You can flag me on gitter at @nathandunn, as well. They should be able to tell you right away.

I'm not sure if moving to 2.4 will help. My guess is that you need to provide exon coordinates in your GFF3 to make it show up the way you desire.

I'm closing this for now, but feel free to follow-up here if it ends up being an Apollo error.

nathandunn commented 4 years ago

@agroppi Just in case I'm being dense, you might try setting the type to transcript instead:

perl flatfile-to-json.pl \
--gff /home/ag/data/temp/augustus.hints.sorted.gff3 \
--type transcript \
--trackLabel Prediction_Gene \
--out /home/ag/data/my_genome
cmdcolin commented 4 years ago

That file is basically a gtf not a gff. I'd convert to gff :)

nathandunn commented 4 years ago

Good eye @cmdcolin! Yeah it doesn't reference a Parent or provide an ID in column 9:

Here is the way they should be nested here:

http://gmod.org/wiki/GFF3#Nesting_Features

agroppi commented 4 years ago

Thanks @nathandunn and @cmdcolin @nathandunn I have alredy tried with --type transcript The original output file frome braker was a GTF I'd converted to GFF3 with the script from braker2 gtf2gff.pl But as you pointed, it seems that it doesn't deal correctly with reference a Parent or provide an ID in column 9

I just find this tool https://github.com/The-Sequence-Ontology/GAL/blob/master/bin/gtf2gff3 Maybe It can do the conversion properly

Cheers from Bordeaux

nathandunn commented 4 years ago

Good luck. Let us know how it goes.

agroppi commented 4 years ago

gtf2gff3has a lot a parsing bugs. I had to clean the raw GTF ouput from Braker2 v2.1.4 (augustus.hints.gff3). And despite this, gtf2gff3 produced a gff3 not displaying correctly in Apollo. What I've done :

The resulting GFF3 (after sorting it) works perfectly in Apollo

cmdcolin commented 4 years ago

The gtf to gff is always awkward even though the formats are quite similar

I have used gffread tool before, but this is good to know this workflow that you used too

http://jbrowse.org/docs/faq_data_loading.html#how-do-i-convert-gtf-to-gff