geneontology / neo

noctua entity ontology
9 stars 2 forks source link

SGD has incomplete GPI #36

Closed suzialeksander closed 2 years ago

suzialeksander commented 5 years ago

SGD is currently only providing UniProt in the GPI in the metadata--taken directly from protein2go upstream as a "stub". At the time, SGD was not currently using Noctua beyond basic experimentation and it was decided that the stub was more information than none. Now that SGD is giving Noctua more use, the obvious identifier issue has come up and needs to be fixed as we proceed with more serious annotation.

As an example, in the Noctua Form, eg "STE3 Scer” pops up with the UniProt ID instead of SGD ID, so the “search database” doesn’t work.

The potential fix in this case: GO can derive a GPI from some other file.

kltm commented 5 years ago

@cmungall Should we derive here or work with SGD to provide a better source for us?

cmungall commented 5 years ago

I think right now we should remove the gpi entry from sgd.yaml. neo will fall back on the GAF. Which is good enough for a well-annotated genome for SGD.

After that we can try another solution. SGD provide a BGI for the Alliance, providing a GPI should not be hard and we can have some shared conversion code.

suzialeksander commented 5 years ago

Sounds like there's a plan? Let me know if there's something SGD specifically needs to do. Thanks

kltm commented 5 years ago

@cmungall https://github.com/geneontology/go-site/pull/953 correct?

kltm commented 5 years ago

Merged and waiting for effect over the weekend.

suzialeksander commented 5 years ago

Update: No sgd Noctua gpad, but our existing annotations to Uniprot accessions are being rejected in the Form summary page (good I suppose) and SGD IDs are autofilling in. What steps do we need to take now to get a Noctua gpad?

kltm commented 5 years ago

Well, if the IDs are SGD, the next time the models are reduced to GPADs in the pipeline they should come out naturally (assuming everything else is okay). If I'm missing the answer you're wanting, feel free to tag me and we can get some of the others to take a look at it.

suzialeksander commented 5 years ago

next time the models are reduced to GPADs in the pipeline

This is a "daily/24h" occurrence or you mean the monthly release?

kltm commented 5 years ago

This is a daily occurrence, which should be completed sometime in the evening, every day. The pickup location is the snapshot set of URLs, but progress can be seen here: https://build.geneontology.org/job/geneontology/job/pipeline/job/snapshot/

suzialeksander commented 5 years ago

Thanks for the info, I'll bet we missed today's cutoff. We'll check it later. Thanks!

suzialeksander commented 2 years ago

ah well it looks like this is fine now