geneontology / go-fastapi

https://api.geneontology.org/
4 stars 3 forks source link

In some cases, the GO API does not return expected result wrt GO-CAMs #87

Closed kltm closed 2 months ago

kltm commented 8 months ago

I'm trying to understand why https://api.geneontology.org/api/gp/SGD:S000003407/models?causalmf=2 returns no values, when I can see 61f34dd300001044.json/61f34dd300001044.ttl, etc. do seem to have this identifier?

My guess is some kind of CURIE disconnect, but I wanted to check with @sierra-moxon to see if anything comes to mind beofer digging in more.

(My initial guess was a date/sync with release gap, but this seems to be a well-established model.)

kltm commented 8 months ago

Tagging @thomaspd

sierra-moxon commented 7 months ago

It looks like we have a mismatch between the URI stored in minerva for SGD http, and the one exposed in db-xrefs.yaml https

GO-API expands SGD:S000003407 to https://identifiers.org/sgd/S000003407 using the go.csv context in prefixmaps. go.csv context in prefixmaps is generated by pulling in the rdf_uri_prefix from the db-xrefs.yaml. Since the URL generated from minerva in the model is http://identifiers.org/sgd/S000003407 (note the http vs, https), no results are returned from the GO-API query.

This PR would fix it, but I am not sure about the consequences of changing the db-xrefs.yaml.

kltm commented 7 months ago

I wanted to specifically tag @balhoff and @cmungall , who might have context about knock-on effects outside of the GO.

balhoff commented 7 months ago

The http version is also in the Biolink prefix map. If we change the namespace used by Minerva to include https, besides updating the Minerva prefix map, we would need to apply a SPARQL Update to modify all models that mention SGD identifiers.

kltm commented 7 months ago

@balhoff Right, I think the thinking here is that the http version should be the "real" one, as that's what we use elsewhere, but the rdf_uri_prefix is "wrong". This PR would revert to the http, allowing that change to propagate and allowing us to continue.

balhoff commented 7 months ago

@kltm that sounds good! I thought SGD was wanting the https.

sierra-moxon commented 7 months ago

I went ahead and made the rest of the go-site changes, tested the changes in the API, and found a secondary problem. It is true that we need the HTTPS->HTTP change in the identifier. It is also true that even with that change, the pathway endpoint still does not return models for this SGD gene.

experimenting further:

without the causalmf2 filter, this is the SPARQL query, and the model is returned:

PREFIX wikidata: <http://www.wikidata.org/entity/>
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

            PREFIX metago: <http://model.geneontology.org/>
            PREFIX dc: <http://purl.org/dc/elements/1.1/>
            PREFIX enabled_by: <http://purl.obolibrary.org/obo/RO_0002333>

            SELECT distinct ?gocam ?title

            WHERE
            {

              GRAPH ?gocam {
                ?gocam metago:graphType metago:noctuaCam .
                ?s enabled_by: ?gpnode .
                ?gpnode rdf:type ?identifier .
                ?gocam dc:title ?title .
                FILTER(?identifier = <http://identifiers.org/sgd/S000003407>) .
              }

            }
            ORDER BY ?gocam

with the causalmf=2 this is the query, and the model is not returned:

PREFIX wikidata: <http://www.wikidata.org/entity/>
PREFIX oio: <http://www.geneontology.org/formats/oboInOwl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

      PREFIX pr: <http://purl.org/ontology/prv/core#>
      PREFIX metago: <http://model.geneontology.org/>
      PREFIX dc: <http://purl.org/dc/elements/1.1/>
      PREFIX providedBy: <http://purl.org/pav/providedBy>
      PREFIX MF: <http://purl.obolibrary.org/obo/GO_0003674>
      PREFIX causally_upstream_of_or_within: <http://purl.obolibrary.org/obo/RO_0002418>
      PREFIX causally_upstream_of_or_within_negative_effect: <http://purl.obolibrary.org/obo/RO_0004046>
      PREFIX causally_upstream_of_or_within_positive_effect: <http://purl.obolibrary.org/obo/RO_0004047>
      PREFIX causally_upstream_of: <http://purl.obolibrary.org/obo/RO_0002411>
      PREFIX causally_upstream_of_negative_effect: <http://purl.obolibrary.org/obo/RO_0002305>
      PREFIX causally_upstream_of_positive_effect: <http://purl.obolibrary.org/obo/RO_0002304>
      PREFIX regulates: <http://purl.obolibrary.org/obo/RO_0002211>
      PREFIX negatively_regulates: <http://purl.obolibrary.org/obo/RO_0002212>
      PREFIX positively_regulates: <http://purl.obolibrary.org/obo/RO_0002213>
      PREFIX directly_regulates: <http://purl.obolibrary.org/obo/RO_0002578>
      PREFIX directly_positively_regulates: <http://purl.obolibrary.org/obo/RO_0002629>
      PREFIX directly_negatively_regulates: <http://purl.obolibrary.org/obo/RO_0002630>
      PREFIX directly_activates: <http://purl.obolibrary.org/obo/RO_0002406>
      PREFIX indirectly_activates: <http://purl.obolibrary.org/obo/RO_0002407>
      PREFIX directly_inhibits: <http://purl.obolibrary.org/obo/RO_0002408>
      PREFIX indirectly_inhibits: <http://purl.obolibrary.org/obo/RO_0002409>
      PREFIX transitively_provides_input_for: <http://purl.obolibrary.org/obo/RO_0002414>
      PREFIX immediately_causally_upstream_of: <http://purl.obolibrary.org/obo/RO_0002412>
      PREFIX directly_provides_input_for: <http://purl.obolibrary.org/obo/RO_0002413>
      PREFIX enabled_by: <http://purl.obolibrary.org/obo/RO_0002333>
      PREFIX hint: <http://www.bigdata.com/queryHints#>
      SELECT DISTINCT ?gocam ?title
      WHERE {
        GRAPH ?gocam  {
          # Inject gene product ID here
          ?gene rdf:type <http://identifiers.org/sgd/S000003407> .
        }
        FILTER EXISTS {
          ?gocam metago:graphType metago:noctuaCam .
        }
        ?gocam dc:title ?title .
        FILTER (
          EXISTS {
            GRAPH ?gocam  {      ?ind1 enabled_by: ?gene . }
            GRAPH ?gocam { ?ind1 ?causal1 ?ind2 }
            ?causal1 rdfs:subPropertyOf* causally_upstream_of_or_within: .
            ?ind1 causally_upstream_of_or_within: ?ind2 .
            GRAPH ?gocam  {       ?ind2 enabled_by: ?gpnode2 . }
            GRAPH ?gocam { ?ind2 ?causal2 ?ind3 }
            ?causal2 rdfs:subPropertyOf* causally_upstream_of_or_within: .
            ?ind2 causally_upstream_of_or_within: ?ind3 .
            GRAPH ?gocam  {       ?ind3 enabled_by: ?gpnode3 . }
            FILTER(?gene != ?gpnode2)
            FILTER(?gene != ?gpnode3)
            FILTER(?gpnode2 != ?gpnode3)
          } ||
          EXISTS {
            GRAPH ?gocam  {       ?ind1 enabled_by: ?gpnode1 . }
            GRAPH ?gocam { ?ind1 ?causal1 ?ind2 }
            ?causal1 rdfs:subPropertyOf* causally_upstream_of_or_within: .
            ?ind1 causally_upstream_of_or_within: ?ind2 .
            GRAPH ?gocam  {          ?ind2 enabled_by: ?gene . }
            GRAPH ?gocam { ?ind2 ?causal2 ?ind3 }
            ?causal2 rdfs:subPropertyOf* causally_upstream_of_or_within: .
            ?ind2 causally_upstream_of_or_within: ?ind3 .
            GRAPH ?gocam  {           ?ind3 enabled_by: ?gpnode3 . }
            FILTER(?gpnode1 != ?gene)
            FILTER(?gpnode1 != ?gpnode3)
            FILTER(?gene != ?gpnode3)
          } ||
          EXISTS {
            GRAPH ?gocam  {       ?ind1 enabled_by: ?gpnode1 . }
            GRAPH ?gocam { ?ind1 ?causal1 ?ind2 }
            ?causal1 rdfs:subPropertyOf* causally_upstream_of_or_within: .
            ?ind1 causally_upstream_of_or_within: ?ind2 .
            GRAPH ?gocam  {           ?ind2 enabled_by: ?gpnode2 . }
            GRAPH ?gocam { ?ind2 ?causal2 ?ind3 }
            ?causal2 rdfs:subPropertyOf* causally_upstream_of_or_within: .
            ?ind2 causally_upstream_of_or_within: ?ind3 .
            GRAPH ?gocam  {         ?ind3 enabled_by: ?gene . }
            FILTER(?gpnode1 != ?gpnode2)
            FILTER(?gpnode1 != ?gene)
            FILTER(?gpnode2 != ?gene)
          }
        )
      }
      ORDER BY ?gocam

If you swap out the id to this WB identifier: WB:WBGene00002147 both versions of this query (with causalmf=2 and without) do return models.

Not being a SPARQL expert, from my eyeballing of the WB noctua model, it does seem to fit the filters on the causalmf=2 query better than my eyeball of the SGD model.

WB example model: http://noctua.geneontology.org/editor/graph/gomodel:5b528b1100000489? SGD model: http://noctua.geneontology.org/editor/graph/gomodel:61f34dd300001044?

kltm commented 7 months ago

@sierra-moxon Wow, that SGD model has much bigness--I imagine it might be easy to miss something in there. @balhoff , I don't suppose you can inuit anything that might be going wrong in there?

balhoff commented 7 months ago

I don't think the SGD model fits the query. It's a bunch of activities connected by has_input/has_input, but not by causal relations.

sierra-moxon commented 7 months ago

@kltm - is there someone who believes that this model should be returned for this query? Do we need to better document the endpoint, the page its rendering other models on, or take some other action here?

kltm commented 7 months ago

@sierra-moxon, this originally came in from @thomaspd:


I’ve previously used a GO-CAM from Marc as an example, and I want to use it in the Alliance paper. But now it doesn’t show up either on the Alliance page or the AmiGO page for the yeast ERG1 gene. There are no GO-CAMs being retrieved for:

https://amigo.geneontology.org/amigo/gene_product/SGD:S000003407

The GO-CAM is http://noctua.geneontology.org/workbench/noctua-visual-pathway-editor/?model_id=gomodel%3A61f34dd300001044

Do you know what might be going on?

The answer is then likely: it used to be a "legit" model, but has been changed such that it no longer is and no longer shows up in displays.

sierra-moxon commented 2 months ago

I think this issue can be closed; please let me know if I closed incorrectly.