biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

For biothings apis, bte might require that some IDs have prefixes and others don't? #423

Closed colleenXu closed 1 year ago

colleenXu commented 2 years ago

Currently we need to write custom transformers to take the raw api responses from APIs (sub-queried) and get a standard format ("records").

However, there's likely an issue with this process regarding biothings APIs (core MyGene, etc and pending apis).

The request: A dev needs to look into stuff, maybe starting here: https://github.com/biothings/api-respone-transform.js/blob/main/src/transformers/biothings_transformer.ts ....and check that the assumptions below are correct. If the assumptions below are correct, we will want to discuss refactoring to untangle / remove code that's based on these assumptions...

I've been assuming that BTE will not correctly process output IDs when...:


This is causing issues for new developers making biothings apis, because it's unclear whether they need to follow the previous convention of having specific prefix spellings for some IDs and having no prefixes on other IDs

colleenXu commented 2 years ago

My suggestion for biothings APIs is that IDs don't have prefixes whenever possible (use the field/key name to say what kind of ID it is). Because the prefix spellings are different inside and outside of Translator, and Translator changes their desired spellings fairly often...

colleenXu commented 2 years ago

Hmmm....

For a biothings API (dgidb) and NCBIGene and CHEMBL.COMPOUND IDs, BTE seemed fine when given curies (prefixed-ids)...

Changes to dgidb smartapi yaml to test this replaced activator and activator-rev operations in x-bte-kgs-operations with this, which uses subject.id and object.id for the API response. Those fields have prefixes on the IDs. ``` activator: ## https://biothings.ncats.io/dgidb/query?q=association.relation_name:activator ## 311 records - supportBatch: true useTemplating: true ## flag to say templating is being used below inputs: - id: "CHEMBL.COMPOUND" semantic: SmallMolecule requestBodyType: object requestBody: body: >- { "q": [ {{ queryInputs | wrap( '["CHEMBL.COMPOUND:' , '","activator"]') }} ], "scopes": ["object.CHEMBL_COMPOUND", "association.relation_name"] } parameters: fields: >- subject.id,association.interaction_group_score, association.provided_by,association.pubmed,association.relation_name size: 1000 outputs: - id: NCBIGene semantic: Gene ## for biolink 2.2.8, DGIdb:activator and CHEMBL.MECHANISM:activator map to the mixin positively_regulates (under regulates) ## vs GAMMA:activator maps to increases_activity_of (under affects) ## CX decided to use the entity-entity version of positively_regulates, after reading some triples' linked pubmed papers predicate: entity_positively_regulates_entity response_mapping: "$ref": "#/components/x-bte-response-mapping/gene-subject-id" ## Example: ## paper: https://pubmed.ncbi.nlm.nih.gov/23828908/ # - CHEMBL.COMPOUND:CHEMBL2252949 ((-)-CAMPHOR) -> NCBIGene:79054 (TRPM8) activator-rev: - supportBatch: true useTemplating: true ## flag to say templating is being used below inputs: - id: NCBIGene semantic: Gene requestBodyType: object requestBody: body: >- { "q": [ {{ queryInputs | wrap( '["' , '","activator"]') }} ], "scopes": ["subject.NCBIGene", "association.relation_name"] } parameters: fields: >- object.id,association.interaction_group_score, association.provided_by,association.pubmed,association.relation_name size: 1000 outputs: - id: "CHEMBL.COMPOUND" semantic: SmallMolecule ## for biolink 2.2.8, DGIdb:activator and CHEMBL.MECHANISM:activator map to the mixin positively_regulates (under regulates) ## vs GAMMA:activator maps to increases_activity_of (under affects) ## CX decided to use the entity-entity version of positively_regulates, after reading some triples' linked pubmed papers predicate: entity_positively_regulated_by_entity response_mapping: "$ref": "#/components/x-bte-response-mapping/chem-object-id" ## Examples: ## - NCBIGene:8856 (NR1I2) -> CHEMBL.COMPOUND:CHEMBL104 (CLOTRIMAZOLE) ``` ALSO ADDED this to the x-bte-response-mapping section: ``` chem-object-id: "CHEMBL.COMPOUND": object.id pubmed: association.pubmed source: association.provided_by relation: association.relation_name ## need to look up what this score means... dgidb_interaction_group_score: association.interaction_group_score gene-subject-id: NCBIGene: subject.id pubmed: association.pubmed source: association.provided_by relation: association.relation_name dgidb_interaction_group_score: association.interaction_group_score ```

EDIT: also seemed fine for another biothings api (idisk) and UMLS IDs

Changes to idisk smartapi yaml to test this replaced has_adverse_effect_on-rev operations in x-bte-kgs-operations with this, which uses _id for the API response. That field has prefixes on the IDs. ``` - supportBatch: true inputSeparator: "," inputs: - id: MEDDRA semantic: Disease requestBody: body: q: '{inputs[0]}' ## no prefix scopes: has_adverse_effect_on.meddra header: application/x-www-form-urlencoded outputs: - id: UMLS semantic: SmallMolecule parameters: fields: _id ## no prefix size: 1000 ## note size limit predicate: adverse_event_caused_by source: "infores:idisk" response_mapping: $ref: '#/components/x-bte-response-mapping/rev-id' ``` ALSO ADDED this to the x-bte-response-mapping section: ``` rev-id: UMLS: _id ```
colleenXu commented 2 years ago

From lab meeting:

ericz1803 commented 2 years ago

@colleenXu I was looking into this issue and it looks like the api-response-transform module can handle both prefixed and non-prefixed IDs already (https://github.com/biothings/api-respone-transform.js/blob/main/src/utils.ts#L1-L9). Is there something I am misunderstanding or missing?

colleenXu commented 2 years ago

Hmmm...I think there may be issues when the ID-prefix is not something that BTE recognizes / knows how to process? For example, if the prefix isn't the format that Translator expects (chembl_compound vs CHEMBL.COMPOUND) or a prefix that has been in Translator for a while with consensus on how to use it (HMS_LINCS_ID comes to mind...)

However, I'd be open to closing this issue for now if things are seeming fine so far...

colleenXu commented 1 year ago

Closing for now, haven't seen issues and if they come up, they can be addressed with the templating method or upcoming jq method