Open suzialeksander opened 8 months ago
tagging @vanaukenk to see if this looks like a simple "fix the SGD GPI" or something, or if this might be a larger issue.
To clarify, the "CPX-1306 Scer" in the right-side, edited model is the resolved label for NEO class SGD:S000218180
, which is the SGD ID tying back to ComplexPortal:CPX-1306
For updating the left-side, as-imported model, we would need some lookup to map ComplexPortal:CPX-756
to its SGD namespace NEO class SGD:S000217886
. It sounds like the SGD GPI could be this lookup.
Note that there are some ComplexPortal IDs in NEO but these example complex classes only exist in NEO using their SGD namespaces.
@suzialeksander @dustine32
So, the idea here is to take the existing ComplexPortal entries, strip them of the ComplexPortal prefix, match the unique id to column three of SGD's GPI file (version 1.2?) and then replace any ComplexPortal curies in the Noctua models with the SGD curies so that the name will resolve properly for display?
@suzialeksander - going forward, will SGD include the ComplexPortal curies as dbxrefs to the SGD protein_complex entries in the gpi file?
@vanaukenk the ComplexPortal curies are already in col9 of the SGD GPI. should they be somewhere else?
some example rows from our current GPI:
SGD S000217570 CPX-532 Adaptor complex AP-1 APL2:APL4:APM1:APS1|EBI-11896492|Adaptor complex AP-1 protein_complex taxon:559292 ComplexPortal:CPX-532 SGD S000217571 CPX-533 Adaptor complex AP-1R APL2:APL4:APM2:APS1|EBI-11896583|Adaptor complex AP-1R protein_complex taxon:559292 ComplexPortal:CPX-533 SGD S000217572 CPX-534 Adapter complex AP-2 APL1:APL3:APM4:APS2|EBI-11896755|Adapter complex AP-2 protein_complex taxon:559292 ComplexPortal:CPX-534 SGD S000217573 CPX-535 Adapter complex AP-3 APL5:APL6:APM3:APS3|EBI-11898515|Adapter complex AP-3 protein_complex taxon:559292 ComplexPortal:CPX-535 SGD S000217574 CPX-536 cAMP-dependent protein kinase complex variant 1 2xBCY1:2xTPK1|EBI-11963349|cAMP-dependent protein kinase complex variant 1 protein_complex taxon:559292 ComplexPortal:CPX-536 SGD S000217575 CPX-537 cAMP-dependent protein kinase complex variant 2 2xBCY1:2xTPK2|EBI-12003988|cAMP-dependent protein kinase complex variant 2 protein_complex taxon:559292 ComplexPortal:CPX-537 SGD S000217576 CPX-571 cAMP-dependent protein kinase complex variant 3 2xBCY1:2xTPK3|EBI-12424950|cAMP-dependent protein kinase complex variant 3 protein_complex taxon:559292 ComplexPortal:CPX-571 SGD S000217577 CPX-572 cAMP-dependent protein kinase complex variant 4 2xBCY1:TPK1:TPK2|EBI-12424978|cAMP-dependent protein kinase complex variant 4 protein_complex taxon:559292 ComplexPortal:CPX-572 SGD S000217578 CPX-573 cAMP-dependent protein kinase complex variant 5 2xBCY1:TPK1:TPK3|EBI-12425007|cAMP-dependent protein kinase complex variant 5 protein_complex taxon:559292 ComplexPortal:CPX-573 SGD S000217579 CPX-574 cAMP-dependent protein kinase complex variant 6 2xBCY1:TPK2:TPK3|EBI-12425036|cAMP-dependent protein kinase complex variant 6 protein_complex taxon:559292 ComplexPortal:CPX-574 SGD S000217580 CPX-575 Ste12/Dig1/Dig2 transcription regulation complex DIG1:DIG2:STE12|EBI-12448881|Ste12/Dig1/Dig2 transcription regulation complex protein_complex taxon:559292 ComplexPortal:CPX-575 SGD S000217581 CPX-576 Tec1/Ste12/Dig1 transcription regulation complex DIG1:STE12:TEC1|EBI-12453638|Tec1/Ste12/Dig1 transcription regulation complex protein_complex taxon:559292 ComplexPortal:CPX-576 SGD S000217596 CPX-1150 SWI/SNF chromatin remodelling complex ARP7:ARP9:RTT102:SNF2:SNF5:SNF6:SNF11:SNF12:SWI1:SWI3:SWP82:TAF14|EBI-15100957|SWI/SNF chromatin remodelling complex protein_complex taxon:559292 ComplexPortal:CPX-1150
@srengel - that's correct; the ComplexPortal xrefs should be in column 9 of the gpi. I was looking at the gpi file available for download on current.geneontology.org which doesn't have those xrefs because it is derived from the GAF. Sorry for any confusion!
Current models: ComplexPortal:CPX http://noctua.geneontology.org/editor/graph/gomodel:SGD_S000000240 CPX- Scer gomodel:SGD_S000000870
@dustine32 does this sound like a fix you can make? And does this sound like a one-off fix, or would something have to be fixed with each load?
@suzialeksander This sounds like some form of SPARQL UPDATE query done against the minerva modelstore though I think @balhoff can correct me on that. I don't think I've ever done a query sourcing a lookup file like ComplexPortal:CPX-1739 -> SGD:S000218211
. Maybe we'd need to inject this lookup (using another query) as xrefs on NEO entities into the modelstore first? I could look at the regular ontology update process for reference. This is likely more of a project than a quick fix.
We'd have to schedule this update during a Noctua outage and, of course, we'd test this on noctua-dev's minerva first.
There could be a migration (sed
on models on disk or SPARQL), but these are fiddly and I'd like to be clear on the mapping (file) to be used, or if it's just a couple of one-offs?
Model on right (MOT2) was edited by SGD curators, model on left (CDC20) is as-imported. SGD would prefer the
CPX-# Scer
similar to the other yeast gene products. After a quick discussion with @dustine32, this might be a straightforward find & replace.