geneontology / pipeline

Declarative pipeline for the Gene Ontology.
https://build.geneontology.org/job/geneontology/job/pipeline/
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Imported ComplexPortal IDs not resolving as SGD IDs #364

Open suzialeksander opened 8 months ago

suzialeksander commented 8 months ago

Model on right (MOT2) was edited by SGD curators, model on left (CDC20) is as-imported. SGD would prefer the CPX-# Scer similar to the other yeast gene products. After a quick discussion with @dustine32, this might be a straightforward find & replace.

Image

suzialeksander commented 8 months ago

tagging @vanaukenk to see if this looks like a simple "fix the SGD GPI" or something, or if this might be a larger issue.

dustine32 commented 8 months ago

To clarify, the "CPX-1306 Scer" in the right-side, edited model is the resolved label for NEO class SGD:S000218180, which is the SGD ID tying back to ComplexPortal:CPX-1306

For updating the left-side, as-imported model, we would need some lookup to map ComplexPortal:CPX-756 to its SGD namespace NEO class SGD:S000217886. It sounds like the SGD GPI could be this lookup.

Note that there are some ComplexPortal IDs in NEO but these example complex classes only exist in NEO using their SGD namespaces.

vanaukenk commented 8 months ago

@suzialeksander @dustine32

So, the idea here is to take the existing ComplexPortal entries, strip them of the ComplexPortal prefix, match the unique id to column three of SGD's GPI file (version 1.2?) and then replace any ComplexPortal curies in the Noctua models with the SGD curies so that the name will resolve properly for display?

@suzialeksander - going forward, will SGD include the ComplexPortal curies as dbxrefs to the SGD protein_complex entries in the gpi file?

srengel commented 7 months ago

@vanaukenk the ComplexPortal curies are already in col9 of the SGD GPI. should they be somewhere else?

some example rows from our current GPI:

SGD S000217570  CPX-532 Adaptor complex AP-1    APL2:APL4:APM1:APS1|EBI-11896492|Adaptor complex AP-1   protein_complex taxon:559292        ComplexPortal:CPX-532   
SGD S000217571  CPX-533 Adaptor complex AP-1R   APL2:APL4:APM2:APS1|EBI-11896583|Adaptor complex AP-1R  protein_complex taxon:559292        ComplexPortal:CPX-533   
SGD S000217572  CPX-534 Adapter complex AP-2    APL1:APL3:APM4:APS2|EBI-11896755|Adapter complex AP-2   protein_complex taxon:559292        ComplexPortal:CPX-534   
SGD S000217573  CPX-535 Adapter complex AP-3    APL5:APL6:APM3:APS3|EBI-11898515|Adapter complex AP-3   protein_complex taxon:559292        ComplexPortal:CPX-535   
SGD S000217574  CPX-536 cAMP-dependent protein kinase complex variant 1 2xBCY1:2xTPK1|EBI-11963349|cAMP-dependent protein kinase complex variant 1  protein_complex taxon:559292        ComplexPortal:CPX-536   
SGD S000217575  CPX-537 cAMP-dependent protein kinase complex variant 2 2xBCY1:2xTPK2|EBI-12003988|cAMP-dependent protein kinase complex variant 2  protein_complex taxon:559292        ComplexPortal:CPX-537   
SGD S000217576  CPX-571 cAMP-dependent protein kinase complex variant 3 2xBCY1:2xTPK3|EBI-12424950|cAMP-dependent protein kinase complex variant 3  protein_complex taxon:559292        ComplexPortal:CPX-571   
SGD S000217577  CPX-572 cAMP-dependent protein kinase complex variant 4 2xBCY1:TPK1:TPK2|EBI-12424978|cAMP-dependent protein kinase complex variant 4   protein_complex taxon:559292        ComplexPortal:CPX-572   
SGD S000217578  CPX-573 cAMP-dependent protein kinase complex variant 5 2xBCY1:TPK1:TPK3|EBI-12425007|cAMP-dependent protein kinase complex variant 5   protein_complex taxon:559292        ComplexPortal:CPX-573   
SGD S000217579  CPX-574 cAMP-dependent protein kinase complex variant 6 2xBCY1:TPK2:TPK3|EBI-12425036|cAMP-dependent protein kinase complex variant 6   protein_complex taxon:559292        ComplexPortal:CPX-574   
SGD S000217580  CPX-575 Ste12/Dig1/Dig2 transcription regulation complex    DIG1:DIG2:STE12|EBI-12448881|Ste12/Dig1/Dig2 transcription regulation complex   protein_complex taxon:559292        ComplexPortal:CPX-575   
SGD S000217581  CPX-576 Tec1/Ste12/Dig1 transcription regulation complex    DIG1:STE12:TEC1|EBI-12453638|Tec1/Ste12/Dig1 transcription regulation complex   protein_complex taxon:559292        ComplexPortal:CPX-576   
SGD S000217596  CPX-1150    SWI/SNF chromatin remodelling complex   ARP7:ARP9:RTT102:SNF2:SNF5:SNF6:SNF11:SNF12:SWI1:SWI3:SWP82:TAF14|EBI-15100957|SWI/SNF chromatin remodelling complex    protein_complex taxon:559292        ComplexPortal:CPX-1150  
vanaukenk commented 7 months ago

@srengel - that's correct; the ComplexPortal xrefs should be in column 9 of the gpi. I was looking at the gpi file available for download on current.geneontology.org which doesn't have those xrefs because it is derived from the GAF. Sorry for any confusion!

suzialeksander commented 3 months ago

Current models: ComplexPortal:CPX http://noctua.geneontology.org/editor/graph/gomodel:SGD_S000000240 CPX- Scer gomodel:SGD_S000000870

@dustine32 does this sound like a fix you can make? And does this sound like a one-off fix, or would something have to be fixed with each load?

dustine32 commented 3 months ago

@suzialeksander This sounds like some form of SPARQL UPDATE query done against the minerva modelstore though I think @balhoff can correct me on that. I don't think I've ever done a query sourcing a lookup file like ComplexPortal:CPX-1739 -> SGD:S000218211. Maybe we'd need to inject this lookup (using another query) as xrefs on NEO entities into the modelstore first? I could look at the regular ontology update process for reference. This is likely more of a project than a quick fix.

We'd have to schedule this update during a Noctua outage and, of course, we'd test this on noctua-dev's minerva first.

kltm commented 3 months ago

There could be a migration (sed on models on disk or SPARQL), but these are fiddly and I'd like to be clear on the mapping (file) to be used, or if it's just a couple of one-offs?