Closed kltm closed 6 months ago
Tagging @thomaspd
It looks like we have a mismatch between the URI stored in minerva for SGD http
, and the one exposed in db-xrefs.yaml https
GO-API expands SGD:S000003407
to https://identifiers.org/sgd/S000003407
using the go.csv
context in prefixmaps. go.csv
context in prefixmaps is generated by pulling in the rdf_uri_prefix
from the db-xrefs.yaml. Since the URL generated from minerva in the model is http://identifiers.org/sgd/S000003407
(note the http
vs, https
), no results are returned from the GO-API query.
This PR would fix it, but I am not sure about the consequences of changing the db-xrefs.yaml.
I wanted to specifically tag @balhoff and @cmungall , who might have context about knock-on effects outside of the GO.
The http
version is also in the Biolink prefix map. If we change the namespace used by Minerva to include https
, besides updating the Minerva prefix map, we would need to apply a SPARQL Update to modify all models that mention SGD identifiers.
@balhoff Right, I think the thinking here is that the http
version should be the "real" one, as that's what we use elsewhere, but the rdf_uri_prefix
is "wrong". This PR would revert to the http
, allowing that change to propagate and allowing us to continue.
@kltm that sounds good! I thought SGD was wanting the https
.
I went ahead and made the rest of the go-site changes, tested the changes in the API, and found a secondary problem. It is true that we need the HTTPS->HTTP change in the identifier. It is also true that even with that change, the pathway endpoint still does not return models for this SGD gene.
experimenting further:
without the causalmf2
filter, this is the SPARQL query, and the model is returned:
PREFIX wikidata: <http://www.wikidata.org/entity/>
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX metago: <http://model.geneontology.org/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX enabled_by: <http://purl.obolibrary.org/obo/RO_0002333>
SELECT distinct ?gocam ?title
WHERE
{
GRAPH ?gocam {
?gocam metago:graphType metago:noctuaCam .
?s enabled_by: ?gpnode .
?gpnode rdf:type ?identifier .
?gocam dc:title ?title .
FILTER(?identifier = <http://identifiers.org/sgd/S000003407>) .
}
}
ORDER BY ?gocam
with the causalmf=2
this is the query, and the model is not returned:
PREFIX wikidata: <http://www.wikidata.org/entity/>
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX pr: <http://purl.org/ontology/prv/core#>
PREFIX metago: <http://model.geneontology.org/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX providedBy: <http://purl.org/pav/providedBy>
PREFIX MF: <http://purl.obolibrary.org/obo/GO_0003674>
PREFIX causally_upstream_of_or_within: <http://purl.obolibrary.org/obo/RO_0002418>
PREFIX causally_upstream_of_or_within_negative_effect: <http://purl.obolibrary.org/obo/RO_0004046>
PREFIX causally_upstream_of_or_within_positive_effect: <http://purl.obolibrary.org/obo/RO_0004047>
PREFIX causally_upstream_of: <http://purl.obolibrary.org/obo/RO_0002411>
PREFIX causally_upstream_of_negative_effect: <http://purl.obolibrary.org/obo/RO_0002305>
PREFIX causally_upstream_of_positive_effect: <http://purl.obolibrary.org/obo/RO_0002304>
PREFIX regulates: <http://purl.obolibrary.org/obo/RO_0002211>
PREFIX negatively_regulates: <http://purl.obolibrary.org/obo/RO_0002212>
PREFIX positively_regulates: <http://purl.obolibrary.org/obo/RO_0002213>
PREFIX directly_regulates: <http://purl.obolibrary.org/obo/RO_0002578>
PREFIX directly_positively_regulates: <http://purl.obolibrary.org/obo/RO_0002629>
PREFIX directly_negatively_regulates: <http://purl.obolibrary.org/obo/RO_0002630>
PREFIX directly_activates: <http://purl.obolibrary.org/obo/RO_0002406>
PREFIX indirectly_activates: <http://purl.obolibrary.org/obo/RO_0002407>
PREFIX directly_inhibits: <http://purl.obolibrary.org/obo/RO_0002408>
PREFIX indirectly_inhibits: <http://purl.obolibrary.org/obo/RO_0002409>
PREFIX transitively_provides_input_for: <http://purl.obolibrary.org/obo/RO_0002414>
PREFIX immediately_causally_upstream_of: <http://purl.obolibrary.org/obo/RO_0002412>
PREFIX directly_provides_input_for: <http://purl.obolibrary.org/obo/RO_0002413>
PREFIX enabled_by: <http://purl.obolibrary.org/obo/RO_0002333>
PREFIX hint: <http://www.bigdata.com/queryHints#>
SELECT DISTINCT ?gocam ?title
WHERE {
GRAPH ?gocam {
# Inject gene product ID here
?gene rdf:type <http://identifiers.org/sgd/S000003407> .
}
FILTER EXISTS {
?gocam metago:graphType metago:noctuaCam .
}
?gocam dc:title ?title .
FILTER (
EXISTS {
GRAPH ?gocam { ?ind1 enabled_by: ?gene . }
GRAPH ?gocam { ?ind1 ?causal1 ?ind2 }
?causal1 rdfs:subPropertyOf* causally_upstream_of_or_within: .
?ind1 causally_upstream_of_or_within: ?ind2 .
GRAPH ?gocam { ?ind2 enabled_by: ?gpnode2 . }
GRAPH ?gocam { ?ind2 ?causal2 ?ind3 }
?causal2 rdfs:subPropertyOf* causally_upstream_of_or_within: .
?ind2 causally_upstream_of_or_within: ?ind3 .
GRAPH ?gocam { ?ind3 enabled_by: ?gpnode3 . }
FILTER(?gene != ?gpnode2)
FILTER(?gene != ?gpnode3)
FILTER(?gpnode2 != ?gpnode3)
} ||
EXISTS {
GRAPH ?gocam { ?ind1 enabled_by: ?gpnode1 . }
GRAPH ?gocam { ?ind1 ?causal1 ?ind2 }
?causal1 rdfs:subPropertyOf* causally_upstream_of_or_within: .
?ind1 causally_upstream_of_or_within: ?ind2 .
GRAPH ?gocam { ?ind2 enabled_by: ?gene . }
GRAPH ?gocam { ?ind2 ?causal2 ?ind3 }
?causal2 rdfs:subPropertyOf* causally_upstream_of_or_within: .
?ind2 causally_upstream_of_or_within: ?ind3 .
GRAPH ?gocam { ?ind3 enabled_by: ?gpnode3 . }
FILTER(?gpnode1 != ?gene)
FILTER(?gpnode1 != ?gpnode3)
FILTER(?gene != ?gpnode3)
} ||
EXISTS {
GRAPH ?gocam { ?ind1 enabled_by: ?gpnode1 . }
GRAPH ?gocam { ?ind1 ?causal1 ?ind2 }
?causal1 rdfs:subPropertyOf* causally_upstream_of_or_within: .
?ind1 causally_upstream_of_or_within: ?ind2 .
GRAPH ?gocam { ?ind2 enabled_by: ?gpnode2 . }
GRAPH ?gocam { ?ind2 ?causal2 ?ind3 }
?causal2 rdfs:subPropertyOf* causally_upstream_of_or_within: .
?ind2 causally_upstream_of_or_within: ?ind3 .
GRAPH ?gocam { ?ind3 enabled_by: ?gene . }
FILTER(?gpnode1 != ?gpnode2)
FILTER(?gpnode1 != ?gene)
FILTER(?gpnode2 != ?gene)
}
)
}
ORDER BY ?gocam
If you swap out the id to this WB identifier: WB:WBGene00002147
both versions of this query (with causalmf=2
and without) do return models.
Not being a SPARQL expert, from my eyeballing of the WB noctua model, it does seem to fit the filters on the causalmf=2
query better than my eyeball of the SGD model.
WB example model: http://noctua.geneontology.org/editor/graph/gomodel:5b528b1100000489? SGD model: http://noctua.geneontology.org/editor/graph/gomodel:61f34dd300001044?
@sierra-moxon Wow, that SGD model has much bigness--I imagine it might be easy to miss something in there. @balhoff , I don't suppose you can inuit anything that might be going wrong in there?
I don't think the SGD model fits the query. It's a bunch of activities connected by has_input/has_input, but not by causal relations.
@kltm - is there someone who believes that this model should be returned for this query? Do we need to better document the endpoint, the page its rendering other models on, or take some other action here?
@sierra-moxon, this originally came in from @thomaspd:
I’ve previously used a GO-CAM from Marc as an example, and I want to use it in the Alliance paper. But now it doesn’t show up either on the Alliance page or the AmiGO page for the yeast ERG1 gene. There are no GO-CAMs being retrieved for:
https://amigo.geneontology.org/amigo/gene_product/SGD:S000003407
The GO-CAM is http://noctua.geneontology.org/workbench/noctua-visual-pathway-editor/?model_id=gomodel%3A61f34dd300001044
The answer is then likely: it used to be a "legit" model, but has been changed such that it no longer is and no longer shows up in displays.
I think this issue can be closed; please let me know if I closed incorrectly.
I'm trying to understand why https://api.geneontology.org/api/gp/SGD:S000003407/models?causalmf=2 returns no values, when I can see 61f34dd300001044.json/61f34dd300001044.ttl, etc. do seem to have this identifier?
My guess is some kind of CURIE disconnect, but I wanted to check with @sierra-moxon to see if anything comes to mind beofer digging in more.
(My initial guess was a date/sync with release gap, but this seems to be a well-established model.)