biothings / pending.api

Set of standalone APIs built with the BioThings SDK for the Translator Project
https://biothings.ncats.io
Apache License 2.0
5 stars 13 forks source link

API Multiomics Wellness update #118

Closed gglusman closed 8 months ago

gglusman commented 1 year ago
erikyao commented 1 year ago

@colleenXu @gglusman API https://biothings.ncats.io/multiomics_wellness_kp updated to version 1.7

colleenXu commented 1 year ago

Putting an example of the x-bte annotation adjustments here...

image

After looking at the diagram of what's in the KP, I've picked the MetaEdge: ClinicalFinding (LOINC) -(correlated_with)-> SmallMolecule (HMDB) to work on as my example.

1. Finding the corresponding operations * I look at the [list of x-bte operation references](https://github.com/Hadlock-Lab/multiomics_wellness_kp/blob/c9fc8a0b2351871cbc878b16f011e9658b4f259f/multiomics_wellness.yaml#L312). They appear to be formatted as `subjectPrefix-objectPrefix`. So I CTRL-F for `LOINC-HMDB` * I find 4 operations listed, 2 per MetaEdge. The "Rev" operations retrieve the data in the "reverse" direction (object ID -> subject). * [`LOINC-HMDB` and `LOINC-HMDB-Rev`](https://github.com/Hadlock-Lab/multiomics_wellness_kp/blob/c9fc8a0b2351871cbc878b16f011e9658b4f259f/multiomics_wellness.yaml#L323): after I CTRL-F to find [where they're written out (in the `components.x-bte-kgs-operations` section)](https://github.com/Hadlock-Lab/multiomics_wellness_kp/blob/c9fc8a0b2351871cbc878b16f011e9658b4f259f/multiomics_wellness.yaml#L1164), I see that the annotated predicates are `correlated_with`, which matches the data. **So these are corresponding operations that I want to update.** * [`LOINC-HMDB-related` and `LOINC-HMDB-related-Rev`](https://github.com/Hadlock-Lab/multiomics_wellness_kp/blob/c9fc8a0b2351871cbc878b16f011e9658b4f259f/multiomics_wellness.yaml#L339): [their annotated predicates](https://github.com/Hadlock-Lab/multiomics_wellness_kp/blob/c9fc8a0b2351871cbc878b16f011e9658b4f259f/multiomics_wellness.yaml#L1790) are `related_to`, and [I don't find any data in the BioThings API with predicates that aren't related_to](https://biothings.ncats.io/multiomics_wellness_kp/query?q=NOT%20association.label:%22biolink:correlated_with%22). **So these list references and operations can be deleted: [commit here](https://github.com/colleenXu/multiomics_wellness_kp/commit/2e3fe69062e66fa45409aa7f8a1a02c7a5864067)**
2. Updating the operations and response-mapping I go to where [`LOINC-HMDB` and `LOINC-HMDB-Rev`](https://github.com/Hadlock-Lab/multiomics_wellness_kp/blob/c9fc8a0b2351871cbc878b16f011e9658b4f259f/multiomics_wellness.yaml#L1164) are in the `components.x-bte-kgs-operations` section. `LOINC-HMDB`: * I need example records/documents from the BioThings API, so I can see the data structure / format for the later steps. The link I saved as a comment no longer works, because of the data/parser updates. At the BioThings API [webpage](https://biothings.ncats.io/multiomics_wellness_kp), I could pick random queries and look at the responses. That helped me know how to adjust my link. [Commit here](https://github.com/colleenXu/multiomics_wellness_kp/commit/edc14b08421e09ad5ace498a45980cdb50ecb383) * `requestBody` field: commit [here](https://github.com/colleenXu/multiomics_wellness_kp/commit/a045ef90d608b93d2d2296fe5389550f3bbd972f) * this helps tells BTE how to query the BioThings API. I want to provide a LOINC ID and retrieve associated SmallMolecules. (It's more efficient to ask BTE for only records where the object ID is HMDB. But because it's a different query format, I'm not making this adjustment right now.) * all prefix fields (`subject.LOINC`, `object.HMDB`) appear to have values that are prefixed. So I add a custom BTE handling `replPrefix('LOINC') | ` (based on nunjucks templating) to ensure that BTE queries this BioThings API properly (puts the right prefix on IDs). I also remove the `# no prefix` comment. * I don't need to specify the predicate in the BioThings API query because all records have the same predicate. So I remove that * `object.type` doesn't seem to be `MolecularEntity`. Instead, it seems to be `biolink:SmallMolecule`. So I replace that * `parameters.fields` field: commit [here](https://github.com/colleenXu/multiomics_wellness_kp/commit/bf8699e4ed8362dd96ef1dc8947741bf7134c8d1) * this controls what data fields I get in response to the query. * a lot of the fields listed there no longer exist. Instead, there's an `association.attributes` field in the example records that holds all the info in TRAPI edge-attribute format. So I make that replacement... * I keep the subject.name and object.name fields because BTE has some custom support: it'll use the names retrieved from this KP if the SRI Node Normalizer doesn't find a name. * I also remove the `# no prefix` comment. * `response_mapping`: [commit here](https://github.com/colleenXu/multiomics_wellness_kp/commit/730ffe998e0dcc40bb3b1954916eadc4bf42a9fe) * this [line](https://github.com/Hadlock-Lab/multiomics_wellness_kp/blob/c9fc8a0b2351871cbc878b16f011e9658b4f259f/multiomics_wellness.yaml#L1198) in the operation references the [`HMDB` reusable yaml chunk](https://github.com/Hadlock-Lab/multiomics_wellness_kp/blob/c9fc8a0b2351871cbc878b16f011e9658b4f259f/multiomics_wellness.yaml#L8457) stored in the x-bte-response-mapping section. * this yaml chunk tells BTE what to do with each field of the record in the response (listed in `parameters.fields`) * `HMDB: object.HMDB` tells BTE to go into the `object.HMDB` field of the record to get the HMDB ID * `input_name` and `output_name` are KEYWORDS that tell BTE to use the values of those fields for the TRAPI node names if the SRI Node Normalizer doesn't find a name. * the rest can be replaced because those fields no longer exist. Instead, use the `edge-attributes` KEYWORD for `association.attributes`, which tells BTE that `association.attributes`'s value is an array of TRAPI edge-attributes, already properly formatted. So it should preserve that formatting as much as possible. * I also remove the `# no prefix` comment. `LOINC-HMDB-Rev`: I do basically the same thing for this "reverse" operation. The "Rev" operation is so I can query with the HMDB ID and get the correlated LOINC (retrieve the data in the "reverse" direction (object ID -> subject)). [Commit here](https://github.com/colleenXu/multiomics_wellness_kp/commit/6e3c87039781c7e96ed8428c0d2eeacc6f0b5f0b) The differences are: * I don't need the link to example records since it's the same as the forward `LOINC-HMDB` one * the ID prefix is HMDB and the subject.type is `biolink:ClinicalFinding`. (not essential: I edited the commented-out section at the bottom of operations)
3. testing edits with an example record * I picked a [record](https://biothings.ncats.io/multiomics_wellness_kp/query?q=_id:%22WKP-LOINC:13457-7-biolink:correlated_with-HMDB:HMDB07218-gender-male%22). BTE's response should have an Edge with this info, when queried starting from the LOINC ID (using operation `LOINC-HMDB`) or from the HMDB ID (uses reverse operation `LOINC-HMDB-Rev`). * At the moment, my example changes aren't deployed (pushed to the registered yaml and refreshed the SmartAPI registration to pull in the changed yaml). So I'll use a local instance of BTE to test. If the example changes were fully deployed, I could query ONLY this KP on BTE's prod instance using this url (which is the KP's SmartAPI registration ID): `https://bte.transltr.io/v1/smartapi/02af7d098ab304e80d6f4806c3527027/query` * setting up local instance of BTE: change `src/config/smartapi_overrides.json` contents (see below) and adjust the link used. `file:///` needs the 3 slashes * run `API_OVERRIDE=true npm run smartapi_sync` so BTE uses the override file to refresh its internal registry * start the local instance of BTE (`npm start` works). Then post queries to `http://localhost:3000/v1/smartapi/wellness/query`
overrides.json contents ``` { "conf": { "only_overrides": true }, "apis": { "wellness": "file:///Users/colleenxu/Desktop/multiomics_wellness_kp/multiomics_wellness.yaml" } } ```
TRAPI query starting from LOINC ID ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["LOINC:13457-7"], "categories":["biolink:ClinicalFinding"] }, "n1": { "categories":["biolink:SmallMolecule"] } }, "edges": { "e0": { "subject": "n0", "object": "n1" } } } } } ``` I get the [record](https://biothings.ncats.io/multiomics_wellness_kp/query?q=_id:%22WKP-LOINC:13457-7-biolink:correlated_with-HMDB:HMDB07218-gender-male%22)'s data in this Edge ``` "d008f25451e1aa31b29f3cb2c30c8450": { "predicate": "biolink:correlated_with", "subject": "LOINC:13457-7", "object": "HMDB:HMDB07218", "attributes": [ { "attribute_type_id": "NCIT:C53236", "description": "Spearman Correlation Test was used to compute the p-value for the association", "value": "NCIT:C53249" }, { "attribute_type_id": "biolink:primary_knowledge_source", "value": "infores:biothings-multiomics-wellness" }, { "attribute_type_id": "STATO:0000085", "description": "Effect size estimate", "value": "0.2510065031728153" }, { "attribute_type_id": "biolink:Association", "description": "Predicate id", "value": "RO:0002610" }, { "attribute_type_id": "GECKO:0000106", "description": "Sample size used to compute the correlation", "value": "653" }, { "attribute_type_id": "NCIT:C61594", "description": "Bonferroni pvalue", "value": "0.00275092761067993" }, { "attribute_type_id": "MeSH:D008297", "description": "gender", "value": "male" } ] }, ```
TRAPI query starting from HMDB ID ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["HMDB:HMDB07218"], "categories":["biolink:SmallMolecule"] }, "n1": { "categories":["biolink:ClinicalFinding"] } }, "edges": { "e0": { "subject": "n0", "object": "n1" } } } } } ``` I get the [record](https://biothings.ncats.io/multiomics_wellness_kp/query?q=_id:%22WKP-LOINC:13457-7-biolink:correlated_with-HMDB:HMDB07218-gender-male%22)'s data in this Edge ``` "e8590b2528b5d7c705cdb377290962f0": { "predicate": "biolink:correlated_with", "subject": "HMDB:HMDB07218", "object": "LOINC:13457-7", "attributes": [ { "attribute_type_id": "NCIT:C53236", "description": "Spearman Correlation Test was used to compute the p-value for the association", "value": "NCIT:C53249" }, { "attribute_type_id": "biolink:primary_knowledge_source", "value": "infores:biothings-multiomics-wellness" }, { "attribute_type_id": "STATO:0000085", "description": "Effect size estimate", "value": "0.2510065031728153" }, { "attribute_type_id": "biolink:Association", "description": "Predicate id", "value": "RO:0002610" }, { "attribute_type_id": "GECKO:0000106", "description": "Sample size used to compute the correlation", "value": "653" }, { "attribute_type_id": "NCIT:C61594", "description": "Bonferroni pvalue", "value": "0.00275092761067993" }, { "attribute_type_id": "MeSH:D008297", "description": "gender", "value": "male" } ] }, ```

Note: a issue was uncovered during this process, where records that differ only by their edge-attributes (aka associations from analyzing different cohorts) aren't being all returned by BTE. I'll raise this as an issue for BTE to fix.

colleenXu commented 1 year ago

Reviewing @gglusman's TRAPI 1.4 yaml:

Important edits for SmartAPI spec/Translator info

x-bte annotation: x-bte-kgs-operations section

(unless otherwise specified, the points apply to all operations)

A. inputs and outputs are one-element arrays (yes, I know it's illogical). Here's what UniProtKB-CAS's should look like:

        inputs:
          - id: UniProtKB
            semantic: Protein
        outputs:
          - id: CAS
            semantic: SmallMolecule

B. as shown above, the outputs should be changed to what the object will be.

C. for "-Rev" operations only: the requestBody.body scopes section and parameters.fields should be adjusted to query the API data from its object field -> retrieving its subject field. For UniProtKB-CAS-Rev, it would look like this:

        requestBody:
          body: >-
            {"q": [ {{ queryInputs | replPrefix('CAS') | wrap( '["', '","biolink:Protein"]' ) }} ],
            "scopes": ["object.CAS", "subject.type"]}

        parameters:
          fields: >-
            subject.UniProtKB,
            association.attributes,
            association.sources,
            subject.name,
            object.name

D. as shown above, from the parameter.fields, remove the ## TRAPI 1.4. Because this is a wrapped text block, the comment will be parsed as text rather than ignored...

E. the response_mapping should correspond with the outputs and what field of the response that corresponds to.

F. For PUBCHEM.COMPOUND AND KEGG.COMPOUND: these prefixes are spelled this way in the API data and in the biolink-model specification. However, you are correct in having no periods in the operation and response-mapping names. So...I suggest the following replacements to fix the handling of these prefixes:

x-bte annotation: x-bte-response-mapping section

For all objects, replace edge_attributes -> edge-attributes. I know it's not the same format as other keywords like trapi_sources, input_name, output_name...oops >.<...

colleenXu commented 1 year ago

@gglusman and completely optional extras...

completely optional tweaks * `info.contact`: can change the contact info to Gwênlyn or another member of Multiomics team * `info.description`: if you want, you can add more text and links to pages explaining what the KP is * `info.x-translator.infores`: not required. Just in case, I always add quotation marks when strings (as keys or as values) have colons in them. Like `infores: "infores:biothings-multiomics-wellness"` * `servers`: can remove the extra production entry (keep the encrypted one) * `component.parameters`: * lines 35 and 84: change "chemical object" -> "object". Related to a [BioThings API spec tweak](https://github.com/NCATS-Tangerine/translator-api-registry/commit/3c076bf123ab8e0429fdbeec1bb4e14ac6a25253) * lines 51 and 101: change "chemical hits" -> "hits". Related to that same BioThings API spec tweak * notes: * in some descriptions, there are two spaces instead of one where wrapped line break was in the original spec * in some descriptions, there were two new lines in the original spec that aren't there now. This was only for readability: it directs a tool reading and displaying the SmartAPI spec to add a paragraph break between two blocks of text.
gglusman commented 1 year ago

@colleenXu All done, I think!

colleenXu commented 1 year ago

@gglusman

Yes, it looks good except for 1 thing! https://github.com/Hadlock-Lab/multiomics_wellness_kp/commit/ad5dc9d89711158ab280662bd3880cb6d4b45b4b#r121229603

I can see how my wording above is confusing >.<.

The info.x-translator.infores line is needed, and my earlier comments were meant to mean "I use double-quotations around strings with characters like periods or colons, out of an abundance of caution. It's not necessary because everything seems to be parsed fine by all tools involved, so this is just an FYI".

colleenXu commented 8 months ago

This issue seems to have been addressed: