biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

utilize new x-bte smartAPI info to construct metakg #182

Closed andrewsu closed 1 year ago

andrewsu commented 3 years ago

tagging @colleenXu to add documentation on the advantages of new x-bte system so @newgene and I can work on prioritization

colleenXu commented 3 years ago

Advantages:

1. reduces total number of queries and redundant queries

Based on this definition: an operation/metaKG edge is a unique combo of input (Biolink semantic) type, output type, and association data (so different knowledge sources can distinguish two operations)

Example:

Monarch (Biolink API) has a lot of useful info. However, BTE currently uses a very narrow set of x-bte operations to get data from this resource.

One problem is that the current x-bte has to write multiple operations for a query (like PhenotypicFeature -> Disease) to handle each ID namespace Monarch gives as input and output.

A related problem is that Monarch will resolve IDs internally and return the same data for different IDs (of the same entity) - so BTE would only need to query once. But with multiple operations written to handle the ID namespaces, BTE will query multiple times and won't know that the API is returning the same info each time. This redundancy could cause issues with scoring based on number of associations/edges.

How new x-bte solves this


2. structure adds useful metadata to response field mapping (context, interpreting variable values)

Example 1

"Context" is something that could be useful, for querying, interpreting results, and scoring results. It also comes up in querying and interpreting clinical data.

However, there's no way in the current x-bte to annotate a response field as context or give info on how to interpret it (is it a species-specific context, an experiment-specific context like the cell line used).

Example 2

DISEASES's [gene-disease associations] in our pending BioThings API include numeric (like z-score) and categorical variables (like evidence type and confidence ranking) that could be useful to users for interpreting results and scoring.

However, these variables need additional info to be understood: range, possible values (categorical variables), what a direction means, and where to learn more about this variable and its calculation.

Note: I found info that suggests that this resource updates weekly, which means we could update our API...Another resource from this lab that we may want to bring into Translator is TISSUES (gene-tissue relationships for humans and other animals).

How new x-bte solves these:

Additional perk:

the structure also makes it easier for the x-bte writer to see what kind of defined info/categories to map the response fields to, rather than using arbitrary names for the mapping. It should also make the process of annotating response fields easier.


3. Built-in automated testing

Automated testing with a known input ID and output ID could help us see if the endpoint is working, if there is an issue with processing and transforming the API output to TRAPI, or an update had a major data change and/or parser issue.

How new x-bte solves this:


4. website template for link-outs

Users and NCATS members have expressed interest in having link-outs from the associations/edges returned to the original knowledge sources. However, the current x-bte doesn't support this, and these website URLs are almost always NOT in the raw API response.

How new x-bte solves this:

colleenXu commented 3 years ago

Issues no matter what x-bte is used:

colleenXu commented 3 years ago

Is new x-bte ready?

No

Yes (or what is done)

andrewsu commented 1 year ago

closing in favor of a new effort led by @colleenXu to redesign SmartAPI annotation