biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

SmartAPI annotation + bug-chasing: ORPHANET -> orphanet #640

Closed colleenXu closed 8 months ago

colleenXu commented 1 year ago

biolink-model folks say the prefix should be orphanet, not the ORPHANET that we've been using. See https://github.com/biolink/biolink-model/issues/1198

MyDisease and BioThings RARe-SOURCE are the two APIs we have that use this ID-namespace. Earlier I tried changing MyDisease to use the all-lowercase prefix (oops mixed into this commit), but I encountered issues when testing and decided to revert it back.

While the issue could be the x-bte annotation, it could also be a bug in BTE or an issue with the SRI Node Normalizer response. The SRI Node Normalizer currently uses the all-CAPS prefix, but it seems to be case-agnostic for the input so I'm not sure what's going on...

rjawesome commented 1 year ago

The PR allows lowercase orphanet (or uppercase ORPHANET) to be used in input curies. However the output from bte is still uppercase OPRHANET due to node normalizer output.

colleenXu commented 1 year ago

@rjawesome @tokebe

I just tested, and the linked, merged PR and current main-branch code don't seem to address this issue.

Here's how I'm testing:

  1. take a local copy of the SmartAPI yaml of ncats rare-source. Use a override to the local file like this:
contents of biothings_explorer/src/config/smartapi_overrides.json ``` { "conf": { "only_overrides": true }, "apis": { "b772ebfbfa536bba37764d7fddb11d6f": "file:///Users/colleenxu/Desktop/translator-api-registry/ncats_rare_source/smartapi.yaml" } } ```
  1. Comment out the references to the diseaseUMLS operations (~lines 314-315). This way only the orphanet operations are used. Keep the current annotation (all-caps ORPHANET). a. remember to save the file / run the smartapi-sync to retrieve these modified file contents.
  2. Start local BTE and run the testExamples query for the operation diseaseOrphanet-gene (see TRAPI query below). The sub-query will run successfully using this operation (logs included below)
query for testing ``` { "message": { "query_graph": { "edges": { "e01": { "subject": "n0", "object": "n1" } }, "nodes": { "n0": { "ids": ["ORPHANET:110"], "categories": ["biolink:Disease"] }, "n1": { "categories": ["biolink:Gene"] } } } } } ```
console logs when ORPHANET is used ``` bte:biothings-explorer-trapi:edge-manager (5) Executing current edge >> "e01" +0ms bte:biothings-explorer-trapi:batch_edge_query Node Update Start +0ms bte:biothings-explorer-trapi:nodeUpdateHandler Getting equivalent IDs... +0ms bte:biothings-explorer-trapi:nodeUpdateHandler curies: {"Disease":["ORPHANET:110"],"PhenotypicFeature":["ORPHANET:110"],"BehavioralFeature":["ORPHANET:110"],"ClinicalFinding":["ORPHANET:110"],"DiseaseOrPhenotypicFeature":["ORPHANET:110"]} +0ms bte:biomedical-id-resolver:SRI SRI resolved type 'DiseaseOrPhenotypicFeature' doesn't match input semantic type 'Disease' for curie 'MONDO:0015229'. Adding entry for 'Disease'. +0ms bte:biomedical-id-resolver:SRI SRI resolved type 'DiseaseOrPhenotypicFeature' doesn't match input semantic type 'PhenotypicFeature' for curie 'MONDO:0015229'. Adding entry for 'PhenotypicFeature'. +0ms bte:biomedical-id-resolver:SRI SRI resolved type 'DiseaseOrPhenotypicFeature' doesn't match input semantic type 'BehavioralFeature' for curie 'MONDO:0015229'. Adding entry for 'BehavioralFeature'. +0ms bte:biomedical-id-resolver:SRI SRI resolved type 'DiseaseOrPhenotypicFeature' doesn't match input semantic type 'ClinicalFinding' for curie 'MONDO:0015229'. Adding entry for 'ClinicalFinding'. +0ms bte:biothings-explorer-trapi:nodeUpdateHandler Got Edge Equivalent IDs successfully. +273ms bte:biothings-explorer-trapi:batch_edge_query Node Update Success +273ms bte:biothings-explorer-trapi:batch_edge_query Start to convert qEdges into APIEdges.... +0ms bte:biothings-explorer-trapi:qedge2btedge Input node is n0 +275ms bte:biothings-explorer-trapi:qedge2btedge Output node is n1 +0ms bte:biothings-explorer-trapi:qedge2btedge KG Filters: { bte:biothings-explorer-trapi:qedge2btedge "input_type": [ bte:biothings-explorer-trapi:qedge2btedge "Disease", bte:biothings-explorer-trapi:qedge2btedge "PhenotypicFeature", bte:biothings-explorer-trapi:qedge2btedge "BehavioralFeature", bte:biothings-explorer-trapi:qedge2btedge "ClinicalFinding", bte:biothings-explorer-trapi:qedge2btedge "DiseaseOrPhenotypicFeature" bte:biothings-explorer-trapi:qedge2btedge ], bte:biothings-explorer-trapi:qedge2btedge "output_type": [ bte:biothings-explorer-trapi:qedge2btedge "Gene" bte:biothings-explorer-trapi:qedge2btedge ] bte:biothings-explorer-trapi:qedge2btedge } +0ms bte:biothings-explorer-trapi:qedge2btedge 1 APIs being used: ["BioThings RARe-SOURCE API"] +1ms bte:biothings-explorer-trapi:qedge2btedge 1 SmartAPI edges are retrieved.... +0ms bte:biothings-explorer-trapi:qedge2btedge Input prefix: ORPHANET +0ms bte:biothings-explorer-trapi:qedge2btedge 1 metaKG are created.... +0ms bte:biothings-explorer-trapi:qedge2btedge BTE found 1 metaKG for this batch. +0ms bte:biothings-explorer-trapi:batch_edge_query qEdges are successfully converted into 1 APIEdges.... +3ms bte:biothings-explorer-trapi:batch_edge_query Start to query APIEdges.... +0ms bte:call-apis:query Resolving ID feature is turned on +0ms bte:call-apis:query call-apis: 1 planned queries for edge e01 +0ms bte:call-apis:query using template builder +0ms bte:call-apis:query { bte:call-apis:query url: 'https://biothings.ncats.io/rare_source/query', bte:call-apis:query params: { with_total: true, fields: 'entrezgene,symbol', size: 1000 }, bte:call-apis:query data: 'q=110&scopes=raresource.disease.orphanet', bte:call-apis:query method: 'post', bte:call-apis:query timeout: 50000, bte:call-apis:query headers: { 'User-Agent': 'BTE/dev Node/v18.16.1 darwin' } bte:call-apis:query } +6ms bte:call-apis:query query success, transforming hits->records... +280ms bte:api-response-transform:index api name BioThings RARe-SOURCE API +0ms bte:api-response-transform:index api tags: gene,disease,annotation,query,translator,biothings +0ms bte:call-apis:query Successful POST https://biothings.ncats.io/rare_source (1 ID): Disease > condition_associated_with_gene > Gene (obtained 26 records, took 278ms) +12ms bte:call-apis:query query completes. +0ms bte:call-apis:query Total number of records returned for this query is 26 +0ms ```
  1. Then replace ORPHANET -> orphanet in the SmartAPI yaml (match case!). Save / run smartapi-sync. Run the same query (you can replace ORPHANET -> orphanet in the query, but it doesn't matter). It looks like the NodeNorm step works alright, but the sub-query isn't generated properly and so no sub-query is done...This is the bug I am referring to.
console logs when orphanet is used: sub-query not generated properly ``` bte:biothings-explorer-trapi:edge-manager (5) Executing current edge >> "e01" +1ms bte:biothings-explorer-trapi:batch_edge_query Node Update Start +0ms bte:biothings-explorer-trapi:nodeUpdateHandler Getting equivalent IDs... +0ms bte:biothings-explorer-trapi:nodeUpdateHandler curies: {"Disease":["orphanet:110"],"PhenotypicFeature":["orphanet:110"],"BehavioralFeature":["orphanet:110"],"ClinicalFinding":["orphanet:110"],"DiseaseOrPhenotypicFeature":["orphanet:110"]} +0ms bte:biomedical-id-resolver:SRI SRI resolved type 'DiseaseOrPhenotypicFeature' doesn't match input semantic type 'Disease' for curie 'MONDO:0015229'. Adding entry for 'Disease'. +0ms bte:biomedical-id-resolver:SRI SRI resolved type 'DiseaseOrPhenotypicFeature' doesn't match input semantic type 'PhenotypicFeature' for curie 'MONDO:0015229'. Adding entry for 'PhenotypicFeature'. +0ms bte:biomedical-id-resolver:SRI SRI resolved type 'DiseaseOrPhenotypicFeature' doesn't match input semantic type 'BehavioralFeature' for curie 'MONDO:0015229'. Adding entry for 'BehavioralFeature'. +0ms bte:biomedical-id-resolver:SRI SRI resolved type 'DiseaseOrPhenotypicFeature' doesn't match input semantic type 'ClinicalFinding' for curie 'MONDO:0015229'. Adding entry for 'ClinicalFinding'. +0ms bte:biothings-explorer-trapi:nodeUpdateHandler Got Edge Equivalent IDs successfully. +272ms bte:biothings-explorer-trapi:batch_edge_query Node Update Success +272ms bte:biothings-explorer-trapi:batch_edge_query Start to convert qEdges into APIEdges.... +1ms bte:biothings-explorer-trapi:qedge2btedge Input node is n0 +275ms bte:biothings-explorer-trapi:qedge2btedge Output node is n1 +0ms bte:biothings-explorer-trapi:qedge2btedge KG Filters: { bte:biothings-explorer-trapi:qedge2btedge "input_type": [ bte:biothings-explorer-trapi:qedge2btedge "Disease", bte:biothings-explorer-trapi:qedge2btedge "PhenotypicFeature", bte:biothings-explorer-trapi:qedge2btedge "BehavioralFeature", bte:biothings-explorer-trapi:qedge2btedge "ClinicalFinding", bte:biothings-explorer-trapi:qedge2btedge "DiseaseOrPhenotypicFeature" bte:biothings-explorer-trapi:qedge2btedge ], bte:biothings-explorer-trapi:qedge2btedge "output_type": [ bte:biothings-explorer-trapi:qedge2btedge "Gene" bte:biothings-explorer-trapi:qedge2btedge ] bte:biothings-explorer-trapi:qedge2btedge } +1ms bte:biothings-explorer-trapi:qedge2btedge 1 APIs being used: ["BioThings RARe-SOURCE API"] +0ms bte:biothings-explorer-trapi:qedge2btedge 1 SmartAPI edges are retrieved.... +0ms bte:biothings-explorer-trapi:qedge2btedge Input prefix: orphanet +0ms bte:biothings-explorer-trapi:qedge2btedge 0 metaKG are created.... +1ms bte:biothings-explorer-trapi:qedge2btedge No metaKG found for this query batch. +0ms bte:biothings-explorer-trapi:batch_edge_query qEdges are successfully converted into 0 APIEdges.... +2ms bte:biothings-explorer-trapi:edge-manager (X) Terminating..."e01" got 0 records. +275ms ```
rjawesome commented 1 year ago

See new PR

colleenXu commented 1 year ago

Still doesn't seem to work, when I test, records are dropped during the "edge-management" step.

@rjawesome I'd like to pause the PR / coding work, and discuss first (see next post).

console logs ``` bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["BiologicalEntity","Gene"] +1ms bte:biothings-explorer-trapi:QNode Node "n1" saving (26) curies... +1s bte:biothings-explorer-trapi:QEdge (7) Updating Entities in "e01" +0ms bte:biothings-explorer-trapi:QEdge (7) Collecting Types: "["Disease","PhenotypicFeature","BehavioralFeature","ClinicalFinding","DiseaseOrPhenotypicFeature"]" +0ms bte:biothings-explorer-trapi:QEdge Collected entity ids in records: [] +0ms bte:biothings-explorer-trapi:QNode Node "n0" intersecting (1)/(0) curies... +0ms bte:biothings-explorer-trapi:QNode Node "n0" kept (0) curies... +0ms bte:biothings-explorer-trapi:edge-manager 'e01' Reversed[false] (0)--(26) entities / (26) records. +1s bte:biothings-explorer-trapi:edge-manager 'e01' dropped (26) records. +0ms bte:biothings-explorer-trapi:QEdge (6) Storing records... +0ms bte:biothings-explorer-trapi:QEdge (6) Applying Node Constraints to 0 records. +0ms ```
colleenXu commented 1 year ago

@rjawesome

Would you say this is mainly happening because of the NodeNorm output for the ID being ORPHANET? And that our tool relies on ID-namespace/prefixes matching exactly (spelling and case) between NodeNorm output and x-bte annotation?

(I may not understood your earlier post >.<)

If "yes, the main issue is NodeNorm output", then I can raise this issue to NodeNorm / biolink-model folks. It may be more an issue of their output than a bug in our tool's behavior...

rjawesome commented 1 year ago

Would you say this is mainly happening because of the NodeNorm output for the ID being ORPHANET? And that our tool relies on ID-namespace/prefixes matching exactly (spelling and case) between NodeNorm output and x-bte annotation?

From what I'm seeing, It seems like that is the main issue and it should be fixed if NodeNorm fixes their capitalization. However, I still think it would be better for BTE to be case insensitive so that it is overall easier to use.

colleenXu commented 1 year ago

Okay, I'll raise this as an issue for Node Norm / biolink-model tomorrow.

On your second point on "case insensitive"...it seems like there are multiple ways to define this:

colleenXu commented 1 year ago

~Note that this discussion on "case insensitive" hasn't happened yet...~ Discussion done on this issue's status. See post in PR https://github.com/biothings/bte_trapi_query_graph_handler/pull/160#issuecomment-1662925818

tokebe commented 1 year ago

Related to https://github.com/biothings/biothings_explorer/issues/591

colleenXu commented 10 months ago

Update! NodeNorm is rolling out an update that will change ORPHANET -> orphanet in their responses.

~It looks like we haven't addressed https://github.com/biothings/biothings_explorer/issues/731 yet, so all instances of BTE are still using NodeNorm Prod. So I think we shouldn't deploy x-bte changes for ORPHANET -> orphanet until after NodeNorm Prod is updated.~ EDIT: see next comment

EDIT, NOTE: I'm not sure if the NodeNorm/prefix change will break any of BTE's tests. I see some test info in bte-server that has ORPHANET text-matches. @tokebe

colleenXu commented 10 months ago

We've decided to use overrides to implement the x-bte changes as the NodeNorm update rolls out.

Jackson said he plans to work on the "BTE using instance-specific NodeNorm" feature

colleenXu commented 10 months ago

Update: we're using overrides for the 3 KPs that have orphanet IDs (mydisease, biothings rare-source, ComplexPortal) -> see this commit

I think we can close this issue once:

We'll then have a separate process to remove the overrides (not needed once the yaml PRs are all merged / registrations refreshed).

colleenXu commented 10 months ago

@tokebe

I double-checked and it's not working on CI, probably because of the larger cache-update issues (recent lab Slack convo)

My test

POST to MyDisease through BTE CI `https://bte.ci.transltr.io/v1/smartapi/671b45c0301c8624abbd26ae78449ca2/query` (from this [testExample](https://github.com/NCATS-Tangerine/translator-api-registry/blob/77ae9e9dbab7411c4044459d026e8f84cdbbcd3b/mydisease.info/smartapi.yaml#L869)) ``` { "message": { "query_graph": { "nodes": { "n0": { "categories": ["biolink:Disease"], "ids": ["orphanet:881"] }, "n1": { "categories": ["biolink:PhenotypicFeature"] } }, "edges": { "e01": { "subject": "n0", "object": "n1", "predicates": ["biolink:has_phenotype"] } } } } } ``` Right now, it seems like the MetaEdges aren't successfully turned into sub-queries. This could be because NodeNorm CI is using `orphanet` but BTE CI is using the registered yaml (`ORPHANET`) rather than the overrides (`orphanet`) I only see the logs in the TRAPI response ``` { "timestamp": "2023-12-16T06:00:10.748Z", "level": "DEBUG", "message": "BTE is trying to find metaKG edges (smartAPI registry, x-bte annotation) connecting from BehavioralFeature,ClinicalFinding,Disease,DiseaseOrPhenotypicFeature,PhenotypicFeature to BehavioralFeature,ClinicalFinding,PhenotypicFeature with predicate has_phenotype", "code": null }, { "timestamp": "2023-12-16T06:00:10.749Z", "level": "DEBUG", "message": "BTE found 2 metaKG edges corresponding to e01. These metaKG edges comes from 1 unique APIs. They are MyDisease.info API", "code": null }, { "timestamp": "2023-12-16T06:00:10.749Z", "level": "WARNING", "message": "BTE didn't find any metaKG for this batch. Your query terminates.", "code": null }, { "timestamp": "2023-12-16T06:00:10.749Z", "level": "INFO", "message": "e01 execution: 0 queries (0 success/0 fail) and (0) cached qEdges return (0) records", "code": null }, { "timestamp": "2023-12-16T06:00:10.749Z", "level": "WARNING", "message": "qEdge (e01) got 0 records. Your query terminates.", "code": null } ```

tokebe commented 10 months ago

Issue should now be addressed by https://github.com/biothings/biothings_explorer/commit/3019cecf670e5b0fc04877c31956b2bbbc3d7e4e, please test again

colleenXu commented 10 months ago

Now it's working on BTE CI! Yay!

EDIT: And while we are deploying this to Test soon, it may not work until NodeNorm Test gets the orphanet prefix update...

colleenXu commented 8 months ago

I've confirmed that things work as-expected after the Prod deployment. Closing issue, updating the registered yamls and registrations, and opening another issue for removing the overrides.

Example: POST to https://bte.transltr.io/v1/smartapi/671b45c0301c8624abbd26ae78449ca2/query, will get a response with results

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Disease"],
                    "ids": ["orphanet:881"]
                },
                "n1": {
                    "categories": ["biolink:PhenotypicFeature"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:has_phenotype"]
                }
            }
        }
    }
}