geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
46 stars 89 forks source link

new qualifiers should be allowed in GAF #802

Closed pgaudet closed 6 years ago

pgaudet commented 6 years ago

@dougli1sqrd Protein2GO is exporting these qualifiers, which are producing false errors in the reports (see for eg: http://release.geneontology.org/2018-09-05/reports/dictybase-report.html).

1561 PARSER ERROR Unknown qualifier: acts_upstream_of_or_within dictyBase DDB_G0267962 tsuA acts_upstream_of_or_within GO:0043326 PMID:18708585 IMP P Probable serine/threonine-protein kinase tsuA tsuA|DDB_G0267962 protein taxon:44689 20080929 dictyBase

We need to allow these. I think this is a modification to gorule-0000001.md.

Thanks, Pascale

tonysawfordebi commented 6 years ago

@pgaudet Really? It shouldn't be - there was a bug in our GAF export process when the new qualifiers were first introduced which meant that they did appear in some GAFs, but I'm pretty sure we fixed that.

@alexsign Can you verify that the new qualifiers are not finding their way into the GAFs?

tonysawfordebi commented 6 years ago

Also, I'm pretty sure that it was agreed at the NY meeting that we wouldn't extend GAF any further, and that the new qualifiers were to be GPAD-only.

pgaudet commented 6 years ago

Well they show up in the error reports, see http://release.geneontology.org/2018-09-05/reports/dictybase-report.html

Any idea where they come from ?

tonysawfordebi commented 6 years ago

What's the source file that's being checked?

pgaudet commented 6 years ago

AFAIK http://release.geneontology.org/2018-09-05/products/annotations/dictybase-src.gaf.gz

pgaudet commented 6 years ago

@kltm Is this right ?

tonysawfordebi commented 6 years ago

Hmmm... dunno where that comes from

pgaudet commented 6 years ago

According to the yaml file, the dictybase file is picked up here: ftp://ftp.ebi.ac.uk/pub/contrib/goa/dictyBase.gpa.gz

is this right ?

tonysawfordebi commented 6 years ago

Well, it's certainly true that we're currently generating & hosting the files for dictyBase, yes

pgaudet commented 6 years ago

Sorry, it must be this one: ftp://ftp.ebi.ac.uk/pub/contrib/goa/dictyBase.gaf.gz (not the GPAD)

pgaudet commented 6 years ago

... and in that file I do see the 'acts_upstream_of_or_within' qualifiers

For eg dictyBase DDB_G0267376 acrA acts_upstream_of_or_within GO:0006171 PMID:11566867 IGI UniProtKB:Q03100 P Adenylate cyclase, terminal-differentiation specific

tonysawfordebi commented 6 years ago

AAAARRRRGGGGGHHHHH!!!!! Found it - there's one (dicty-specific) script that slipped through the net...

I'll fix it.

tonysawfordebi commented 6 years ago

Script now fixed, and files being regenerated.

pgaudet commented 6 years ago

great ! you mean no other species had that problem ??

tonysawfordebi commented 6 years ago

Yes - the problem was in a script that we put together specifically for dictyBase (to generate files with dicty IDs in column 2, rather than UniProtKB / RNAcentral ones); the scripts that we use for generating files as part of a GOA release had already been fixed.

pgaudet commented 6 years ago

Awesome, thanks !