genometools / genometools

GenomeTools genome analysis system.
http://genometools.org
Other
284 stars 65 forks source link

gff3validator "Sequence Ontology" out of date? #1019

Closed rzelle-lallemand closed 1 year ago

rzelle-lallemand commented 1 year ago

Problem description

The support team of the Saccharomyces Genome Database just wrote to me: "Our current GFF3 file can be found here: https://sgd-prod-upload.s3.amazonaws.com/S000342616/saccharomyces_cerevisiae.20230315.gff.gz".

I tried to validate this GFF with http://genometools.org/cgi-bin/gff3validator.cgi, but am getting the validation error:

Validation unsuccessful!

GenomeTools error: type "uORF" on line 164 in file "/var/www/servers/genometools.org/htdocs/cgi-bin/gff3/saccharomyces_cerevisiae.20230315.gff" is not a valid one

However, it looks like this is a valid Sequence Ontology term that was added in 2014:

https://github.com/The-Sequence-Ontology/SO-Ontologies/blob/07b30b453295147efa9f9f8c017907cd147fcaa9/Ontology_Files/so.obo#L18966-L18975

The gff3validator webpage also says "Last update: 2015-01-25", while the Sequence Ontology has received updates since then (although they don't seem to have (many) versioned releases: https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/606 ). Could the online gff3validator be updated with a current copy of the Sequence Ontology?

Exact command line call triggering the problem

N/A

Example minimal input triggering the problem

What GenomeTools version are you reporting an issue for (as output by gt -version)?

GFF3 online validator Last update: 2015-01-25

Did you compile GenomeTools from source? If so, please state the make parameters used.

N/A

What operating system (e.g. Ubuntu, Mac OS X), OS version (e.g. 15.10, 10.11) and platform (e.g. x86_64) are you using?

N/A

satta commented 1 year ago

You're right, the OBO file on the webserver hasn't been updated in a while; it doesn't have this type yet:

$ fgrep uORF genometools_for_web/gtdata/obo_files/so.obo            
$

Should be easy to update though, since the current GenomeTools distribution from git already has a newer version of so.obo that includes this type:

$ fgrep uORF gtdata/obo_files/so.obo
name: uORF
synonym: "regulatory uORF" EXACT []
name: AUG_initiated_uORF
def: "A uORF beginning with the canonical start codon AUG." [PMID:26684391, PMID:27313038]
synonym: "AUG initiated uORF" EXACT []
is_a: SO:0002027 ! uORF
name: non_AUG_initiated_uORF
def: "A uORF beginning with a codon other than AUG." [PMID:26684391, PMID:27313038]
synonym: "non AUG initiated uORF" EXACT []
is_a: SO:0002027 ! uORF

When trying to validate your file against this more recent version (with the standalone validator), I get a new issue:

$ ./bin/gt gff3validator -typecheck gtdata/obo_files/so.obo ~/Downloads/saccharomyces_cerevisiae.20230315.gff
./bin/gt gff3validator: error: the child feature with type 'transposable_element' on line 401 in file "/home/satta/Downloads/saccharomyces_cerevisiae.20230315.gff" is not part-of parent feature with type 'transposable_element_gene' given on line 399 (according to type checker 'OBO file gtdata/obo_files/so.obo')

which is correct since the part-of relationship is swapped in that situation:

chrI    SGD     transposable_element_gene       160597  164187  .       -       .       ID=YAR009C;Name=YAR009C;Alias=YARCTyB1-1,truncated%20gag-pol%20fusion%20protein;Ontology_term=GO:0000943,GO:0003723,GO:0003887,GO:0003964,GO:0004540,GO:0005634,GO:0005737,GO:0008233,GO:0032197,SO:0000704;Note=Retrotransposon%20TYA%20Gag%20and%20TYB%20Pol%20genes%3B%20Gag%20processing%20produces%20capsid%20proteins%2C%20Pol%20is%20cleaved%20to%20produce%20protease%2C%20reverse%20transcriptase%20and%20integrase%20activities%3B%20in%20YARCTy1-1%20TYB%20is%20mutant%20and%20probably%20non-functional%3B%20protein%20product%20forms%20cytoplasmic%20foci%20upon%20DNA%20replication%20stress;display=Retrotransposon%20TYA%20Gag%20and%20TYB%20Pol%20genes;dbxref=SGD:S000000067;curie=SGD:S000000067
chrI    SGD     transposable_element    160597  164187  .       -       .       ID=YAR009C_transposable_element;Name=YAR009C_transposable_element;Parent=YAR009C

The _transposable_elementgene should be part-of the _transposableelement, not the other way around. See http://www.sequenceontology.org/browser/current_svn/term/SO:0000111.

I will update the so.obo file on the webserver soon and would please ask you to use the standalone validator in the meantime. Thanks for letting us know!

rzelle-lallemand commented 1 year ago

I will update the so.obo file on the webserver soon

Thanks, also for the additional sleuthing into this GFF!

satta commented 1 year ago

This is done now, the validator on the website now runs with a more recent SO version.