biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
9 stars 10 forks source link

add ID resolution on output IDs of text-mining / multiomics provider APIs #318

Closed colleenXu closed 2 years ago

colleenXu commented 2 years ago

It looks like BTE is not providing ID resolution support when querying by API for the text-mining and multiomics provider APIs.

Do we want to add this feature or not?

Example: POST to text-mining targeted association API through BTE: http://localhost:3000/v1/smartapi/978fe380a147a8641caf72320862697b/query and notice that none of the Disease output IDs have names or additional equivalent IDs (under biolink:xref node attribute).

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["CHEBI:3309", "CHEBI:5147"],
                    "categories":["biolink:SmallMolecule"]
                },
                "n1": {
                    "categories":["biolink:Disease"]
               }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}
colleenXu commented 2 years ago

After speaking with @andrewsu , we agreed that we didn't know the reasoning for doing this and that we'd like to fix this. We're not clear on the priority of this...

tokebe commented 2 years ago

@colleenXu @andrewsu This was a strange journey...I traced the execution to find where these specific results weren't getting resolved, until I came all the way back to the top. ID Resolution for these two APIs is disabled...exclusively in the query_v1_by_api endpoint. Good lesson for me in checking execution top-down briefly before diving into breakpoint tracing in the future, ahaha.

Running this same query on the normal query_v1 endpoint and comparing the IDs shows that they resolve just fine otherwise (confirmed programmatically). This is exclusively due to this line in query_v1_by_api.js, which, tracing back, comes from this commit.

I'm not sure why this was added, but it's trivial to remove...

andrewsu commented 2 years ago

I don't see a reason why text-mining provider and multiomics provider should be treated differently than any other API with respect to ID resolution. I'm going to tag @newgene just in case he remembers any historical rationale for why that should exist. If not, let's remove it...

newgene commented 2 years ago

I also don't know any particular reason for this line. Could be performance issue at that time (likely these two APIs tend to return large output, which triggers too many resolver API requests? At that time, we are probably still using our own resolver module).

I would comment out this line with some explanation notes and then observe how the queries behave.

tokebe commented 2 years ago

Here is the result of the query made to dev, with the line commented out (the same behavior was present in the async by_api endpoint). Looks like it works, I'll make a PR with the changes to both endpoints.