biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

phase 1: provenance refactor for edges from some multiomics KPs, text-mining KP, TRAPI KPs #617

Closed colleenXu closed 1 year ago

colleenXu commented 1 year ago

Background

Overview

Text-Mining KP and some Multiomics KPs

We will expect Text-Mining KP and some Multiomics KPs (ClinicalTrials, BIGGIM-drug-response) BioThings KP APIs to...

Then we will...

example:

"edge_TMKP_1":
{
  "subject": "CHEBI:12345",
  "object": "MONDO:456",
  "predicate": "biolink:treats",
  "attributes": [ ... ],
  "sources": [
    { 
      "resource_id": "infores:biothings-explorer", 
      "resource_role": "aggregator_knowledge_source", 
      "upstream_resource_ids`: [ "infores:text-mining-targeted" ] 
    },
    {....},  // other elements are the trapi-1.4 sources data text-mining-targeted provided
    {....}
  ]
}

TRAPI KPs

We will expect TRAPI KP edges to have a sources property on their edges already. We'll then add an element for BTE that references their KP API infores...

example:

"edge_automat_hetio":
{
  "subject": "thing1",
  "object": "thing2",
  "predicate": "biolink:affects",
  "attributes": [ ... ],
  "sources": [
    { 
      "resource_id": "infores:biothings-explorer", 
      "resource_role": "aggregator_knowledge_source", 
      "upstream_resource_ids`: [ "infores:automat-hetio" ] 
    },
    {....},  // other elements are the trapi-1.4 sources data that automat-hetio provided
    {....}
  ]
}
colleenXu commented 1 year ago

Note that COHD's dev instance seems to be on TRAPI 1.4 (we can access it through the registration we currently use, but they also registered a separate yaml for TRAPI 1.4)

However, I haven't checked their /query responses to see if they are providing provenance as we expect, and whether we can use it to develop and test our code for this issue...

From my post here: https://github.com/biothings/biothings_explorer/issues/597#issuecomment-1502686927

colleenXu commented 1 year ago

For the multiomics / text-mining KP stuff:

I'm wondering, do we always want to add the resource ID as infores:biothings-explorer? That makes sense for the ARA-endpoints. but for the team-specific / api-specific endpoints, maybe it makes sense to add the resource ID as infores:service-provider-trapi...

colleenXu commented 1 year ago

Note that the post above is related to this, and it looks like we haven't done the infores:service-provider-trapi handling yet

colleenXu commented 1 year ago

Pasting my Translator Slack message to Multiomics / Text-Mining KPs below

BTE's dev instance now has support for TRAPI 1.4 provenance (aka the sources section on Edges).

If your BioThings API includes TRAPI 1.4 sources data, the following is needed to hook this up with BTE:

  • make a branch or fork of the SmartAPI yaml registered for your API

  • edit each operation:

    • In the parameters.fields : the JSON-notation-paths listed here (string that's comma-delimited) should cover the sources data field. This part of the query to BioThings APIs specifies what parts of the record to return in the response
    • Ex: if the sources data was in association.sources, this would work
      parameters:
        fields: object.MONDO,association.edge_attributes,association.sources
  • edit each entry in the x-bte-response-mapping section: add a key-value pair. The key is trapi_sources and the value is the JSON-notation-path to the sources data. BTE will recognize this key and handle the data in the field specified appropriately

    • Ex:
      x-bte-response-mapping:
      mondo-object:
      MONDO: object.MONDO
      edge-attributes: association.edge_attributes
      trapi_sources: association.sources
  • let me know. I'll be writing a SmartAPI-overrides file so BTE will use these TRAPI 1.4-specific files to retrieve the sources data and handle it appropriately. Once this SmartAPI-overrides file is deployed on BTE's dev instance, the changes will go live <=10 min later

colleenXu commented 1 year ago

Status of the multiomics / text-mining APIs

[Update in progress 2023-06-01 evening]

We'll be using temporary SmartAPI overrides (currently on main) to direct BTE to query for and ingest the TRAPI 1.4 sources data from Multiomics/Text-Mining KPs.

The override now contains links for all 4 KPs.

However, these KPs' x-bte annotation are at different states, for their registered yamls (staying at TRAPI 1.3 and used by BTE prod) and the override yamls (with the changes for TRAPI 1.4 sources data and used by all other BTE instances).

both yamls working

1 yaml working

no yamls working

colleenXu commented 1 year ago

Recording info on hooking BTE up to TRAPI 1.4 KPs in https://github.com/biothings/biothings_explorer/issues/614#issuecomment-1543422889

colleenXu commented 1 year ago

Multiomics ClinicalTrials KP: NEEDED SmartAPI yaml edits

Edits for registered yaml (TRAPI 1.3 instances) Main issue: The [deployed change](https://github.com/biothings/pending.api/issues/114#issuecomment-1548683127) flipped the subject/object in records (now Treatments are subjects and Diseases are objects). This was a BREAKING change and now BTE is not properly querying this API Addressing the main issue: Change this [file](https://github.com/NCATS-Tangerine/translator-api-registry/blob/cc5bf2533fe2803cafa8b002d920c457d9a9b30c/multiomics_clinical_trials/smartapi.yaml), then ask me to update the [registration](http://smart-api.info/registry?q=08a5ddcde71b4bf838327ef469076acd). Once the registration is updated, it'll be <=10 min before BTE picks up this update. * line 58: `00065273_C0025362_C0009079` -> replace with `00065273_C0009079_C0025362` * line 122: `00065273_C0025362_C0009079` -> replace with `00065273_C0009079_C0025362` * line 123: `00065273_C0025362_C0171023` -> replace with `00065273_C0171023_C0025362` * line 178, 279, 592, 622, 636: `subject.UMLS` -> replace with `object.UMLS` * line 598, 616, 633: `object.UMLS` -> replace with `subject.UMLS` Optional things to do: * add a comment explaining the versioning of the BioThings API (what is this the date of?) (add around line 16, comments start with `#`) * add an API-level tag "multiomics" (add around line 32)
Edits for forked yaml (TRAPI 1.4 instances) Main issue: same as above Addressing the main issue: * Change this [file in fork](https://github.com/GitHubbit/translator-api-registry/blob/085ed25bcd0e7441404c30cb96ac67f1b6bd890c/multiomics_clinical_trials/smartapi.yaml). Once the changes are pushed, BTE will automatically take the parsed file in <=10 min. * same list of edits as above, except different line numbers (bold) are involved for the edits below: * line 178, 279, 592, **623**, **639**: `subject.UMLS` -> replace with `object.UMLS` * line 598, **617**, **635**: `object.UMLS` -> replace with `subject.UMLS` Optional things to do: * same as above

Other notes:

colleenXu commented 1 year ago

Multiomics EHR Risk KP: NEEDED SmartAPI yaml edits

Edits for registered yaml (TRAPI 1.3 instances) #### Main issue The [deployed changes](https://github.com/biothings/pending.api/issues/113#issuecomment-1550568443) change a node-category, how edge-attributes are formatted, and other formatting. These are BREAKING changes and now BTE is not properly querying this API or properly parsing responses. #### Addressing the main issue Change this [file](https://github.com/Hadlock-Lab/clinical_risk_kp/blob/e036371f0e216ea6e5ddf0fd4e16cddb6875d12a/ehr_risk_kp.yaml), then ask me to update the [registration](http://smart-api.info/registry?q=d86a24f6027ffe778f84ba10a7a1861a) AND remove the primarySource tag [here](https://github.com/biothings/biothings_explorer/blob/4a2d62145e9ceb25ff79b4660620cfdf9f7d3b14/src/config/apis.js#L146). Once the registration is updated, it'll be <=10 min before BTE picks up this update. Note: There are 148 operations and 12 response-mapping entries. 1. find this section of text (note the indent and commas! 148 instances)
old text ``` association.provenance, association.auc_roc,association.p_values, association.feature_coefficient,association.odd_ratio, association.classifier,association.original_predicate,association.provided_date, ```
and replace it with this text (note the indent and comma!!!) ``` association.edge_attributes, ``` 2. find the following and replace: a. `"Disease"` (note the quotation marks! 106 instances), replace with `"biolink:Disease"` b. `"PhenotypicFeature"` (114 instances), replace with `"biolink:PhenotypicFeature"` c. `"Procedure"` (28 instances), replace with `"biolink:Procedure"` d. `ChemicalSubstance` (80 instances), replace with `ChemicalEntity`. Then look for `"ChemicalEntity"` (note the quotation marks! 48 instances), replace with `"biolink:ChemicalEntity"` 3. find the following and replace: a. `{{ queryInputs | wrap` (73 instances), replace with `{{ queryInputs | rmPrefix() | wrap` b. ` | replPrefix('NCIT')` (38 instances), replace with ` | rmPrefix()` c. ` | addPrefix("SNOMEDCT")` (27 instances), replace with ` | rmPrefix()` d. ` | addPrefix("UNII")` (10 instances), replace with ` | rmPrefix()` 4. and last, find this section of text (there are 12 instances in the response-mapping section)
old text ``` model_url: association.provenance classifier_used: association.classifier "biolink:original_predicate": association.original_predicate date_provided: association.provided_date auc_roc: association.auc_roc p-value: association.p_values feature_coefficient: association.feature_coefficient odds_ratio: association.odd_ratio ```
and replace it with this text ``` edge-attributes: association.edge_attributes ``` #### Later important tasks * get a list of meta-triples for this KP: unique combos of subject-prefix / subject-category / predicate / object-prefix / object-category (and qualifier-set, if applicable) * then edit the x-bte annotations to add / remove operations. * this may involve editing all 3 x-bte sections (the list under /query, the written-out operations in /components, and the response-mapping in /components) * These are the missing operations I found (but I only checked a fraction of the yaml). If you search these operation names, you'll find the notes and link-outs in the yaml on them (but the link-outs will need a find-replace fix, see the optional section below)
known operations to add * DiseaseNCIT_increased_DiseaseSNOMEDCT * DiseaseSNOMEDCT_increased_DiseaseMONDO * DiseaseSNOMEDCT_increased_DiseaseNCIT * DiseaseSNOMEDCT_increased_DiseaseSNOMEDCT * DiseaseNCIT_increased_PhenoHP * DiseaseNCIT_increased_PhenoNCIT * DiseaseNCIT_increased_PhenoSNOMEDCT * DiseaseSNOMEDCT_increased_PhenoNCIT * DiseaseSNOMEDCT_increased_PhenoSNOMEDCT * DiseaseSNOMEDCT_increased_ProcedureNCIT
#### Optional things to do * add a comment explaining the versioning of the BioThings API (what is this the date of?) (add around line 12, comments start with `#`) * add an API-level tag "multiomics" (add around line 28) * update the commented link-outs to the BioThings API * `.type:Disease` -> `.type:"biolink:Disease"` (97 instances) * `.type:PhenotypicFeature` -> `.type:"biolink:PhenotypicFeature"` (97 instances) * `.type:Procedure` -> `.type:"biolink:Procedure"` (33 instances) * `.type:ChemicalEntity` -> `.type:"biolink:ChemicalEntity"` (29 instances) * update comments on number of records retrievable for each set of operations, the testExamples
What's needed for TRAPI 1.4 support Once needed fixes above are done for the registered yaml....follow the instructions [here (Translator Slack link)](https://ncatstranslator.slack.com/archives/C022EL8D3AB/p1682640289901529)
colleenXu commented 1 year ago

[as of 2023-07-13 evening]

We're preparing to move all BTE instances to TRAPI 1.4, which will involve removing the SmartAPI overrides and updating the registered yamls.

ready for migration

tokebe commented 1 year ago

Closing as complete. Any future problems with provenance (further/alternate support for KPs, bugs, etc.) can be tracked in future issues.