biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://api.bte.ncats.io
Apache License 2.0
8 stars 9 forks source link

KP is returning a curie that wasn't asked for without a query id #815

Open maximusunc opened 2 months ago

maximusunc commented 2 months ago

I can send this curl:

curl -X POST https://api.bte.ncats.io/v1/team/Service%20Provider/query \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "message": {
    "query_graph": {
      "nodes": {
        "e": {
          "ids": [
            "CHEBI:8382",
            "PUBCHEM.COMPOUND:22420",
            "PUBCHEM.COMPOUND:10150081",
            "CHEBI:4911",
            "CHEBI:74947",
            "UNII:4F4X42SYQ6",
            "CHEBI:28748",
            "CHEBI:63635",
            "UNII:T4H8FMA7IM",
            "CHEBI:9168",
            "CHEBI:52717",
            "CHEBI:10110",
            "CHEBI:28445",
            "CHEBI:72690",
            "PUBCHEM.COMPOUND:135565884"
          ],
          "categories": ["biolink:ChemicalEntity"],
          "set_interpretation": "BATCH",
          "constraints": []
        },
        "i": {
          "ids": null,
          "categories": ["biolink:BiologicalEntity"],
          "set_interpretation": "BATCH",
          "constraints": []
        }
      },
      "edges": {
        "edge_2": {
          "subject": "i",
          "object": "e",
          "knowledge_type": null,
          "predicates": ["biolink:affects"],
          "attribute_constraints": [],
          "qualifier_constraints": [
            {
              "qualifier_set": [
                {
                  "qualifier_type_id": "biolink:object_aspect_qualifier",
                  "qualifier_value": "activity_or_abundance"
                },
                {
                  "qualifier_type_id": "biolink:object_direction_qualifier",
                  "qualifier_value": "decreased"
                }
              ]
            }
          ]
        }
      }
    }
  }
}' > service_provider_response.json

And one of the results has CHEBI:4027 as a result for e without a query id, even though it was not asked for.

tokebe commented 2 months ago

CHEBI:4027 is obtained from the node normalizer as a primary equivalent of PUBCHEM.COMPOUND:22420, with options conflate:true and drug_chemical_conflate:true, so while it was not explicitly asked for, it was covered by what was explicitly asked for.

andrewsu commented 2 months ago

Just a quick note that this equivalency is specifically from the dev instance of nodenorm: https://nodenormalization-sri.renci.org/get_normalized_nodes?curie=PUBCHEM.COMPOUND:22420&drug_chemical_conflate=true -- doesn't appear to be found in ci / test / prod nodenorm, so I wouldn't expect you'd see that result from the other instances of BTE...

tokebe commented 2 months ago

I've confirmed that this equivalency is not seen on BTE Prod.

colleenXu commented 2 months ago

Found relevant info (Translator Slack link): NodeNorm Dev was updated 4/19 to use an updated "preferred prefix order", so CHEBI is now preferred over PUBCHEM.COMPOUND. This explains the different primary ID for Dev vs other instances.

tokebe commented 1 month ago

@maximusunc Is this sufficient explanation that this issue may be closed?

maximusunc commented 1 month ago

No, I think that even if you're giving the primary equivalent back, it's still not giving back what was asked and PUBCHEM.COMPOUND:22420 should be listed as the query_id.

tokebe commented 1 month ago

Updating from internal discussion, example node binding:

{ "id": "CHEBI:4027", "query_id": "PUBCHEM.COMPOUND:22420", "attributes": [] }

This can be implemented as a post-pass on node bindings after results assembly.

@maximusunc In the interest of prioritization, can you say how critically this issue affects your team?

maximusunc commented 1 month ago

This issue has pointed out that we need to do some extra error checking on our end to make sure all results are expected. We're currently throwing out the entire response, but I think the plan is to just remove any specific results that can't be linked back to the query graph. We're also going to add the drug_conflation flag to our node norm calls. So this isn't super high priority from our end.

colleenXu commented 1 month ago

@maximusunc

To double-check, is this a problem Aragorn is encountering? Or some other tool? And NodeNorm is updating to CI later today....so maybe these problems will start happening on CI afterwards?

I'm wondering if there's a way for your tool to use/send the same primary IDs that our tool is using (we're using the NodeNorm instance corresponding to our tool's instance + setting conflate to true for gene/protein and drug/chem). (Note that we made a change a few months ago to set drug_chemical_conflate to true, after asking Chris Bizon in Slack about it (issue, Translator Slack link).)

Of course, you make a valid point that query_id should be used. I wasn't aware of that query_id was/should be used this way (oops >.<).

maximusunc commented 1 month ago

Aragorn is encountering this, and we're also using the node norm corresponding to the same env, but we're missing the drug/chem conflation flag. We're going to be adding that flag in soon so that our primary IDs will match, so this specific issue will be resolved, but I think it's also good that we uncovered the query_id issue as well.