biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://api.bte.ncats.io
Apache License 2.0
8 stars 9 forks source link

add `max research phase` to `treatsChembl` edges from mychem.info #813

Open andrewsu opened 2 months ago

andrewsu commented 2 months ago

Per a request from @mbrush, I'm creating this issue to add a max research phase attribute to edges based on treatsChembl and treatsChembl-rev in the mychem.info openAPI annotation file. This info will be used in this CQS query template. Note that the application of the attribute constraint will occur within the CQS, but we need to make sure that BTE returns the attribute in the response.

test query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01",
          "predicates": [
            "biolink:in_clinical_trials_for"
          ]
        }
      },
      "nodes": {
        "n00": {
          "categories": [
            "biolink:ChemicalEntity"
          ]
        },
        "n01": {
          "categories": [
            "biolink:Disease"
          ],
          "ids": [
            "MONDO:0004979"
          ]
        }
      }
    }
  }
}

example edge returned from https://bte.ci.transltr.io/v1/query:

                "d29ff5f4006b1523fea5a64ef3b36292": {
                    "predicate": "biolink:in_clinical_trials_for",
                    "subject": "PUBCHEM.COMPOUND:1993",
                    "object": "MONDO:0004979",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:publications",
                            "value": [
                                "https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=e8983096-65e6-41f2-8b6f-bbf0a7227307",
                                "https://clinicaltrials.gov/search?id=%22NCT02584257%22",
                                "https://clinicaltrials.gov/search?id=%22NCT05292976%22",
                                "https://clinicaltrials.gov/search?id=%22NCT02097537%22",
                                "https://clinicaltrials.gov/search?id=%22NCT01907334%22",
                                "clinicaltrials:NCT02584257",
                                "clinicaltrials:NCT05292976",
                                "clinicaltrials:NCT02097537",
                                "clinicaltrials:NCT01907334",
                                "https://clinicaltrials.gov/ct2/results?id=%22NCT03505489%22",
                                "clinicaltrials:NCT03505489"
                            ],
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                       ...
                    ]
                },

The fix to this issue would add a new entry under attributes for max research phase. As noted in https://github.com/NCATS-Tangerine/translator-api-registry/blob/biolink-4-update/mychem.info/openapi_full.yml#L1069, this info is available from mychem.info under chembl.drug_indications.max_phase_for_ind.

colleenXu commented 2 months ago

The changes should be live on Dev/CI within 10 min of the linked commit. BTE/Service Provider now create a "biolink:max_research_phase" edge-attribute from the MyChem chembl.drug_indications.max_phase_for_ind info.

However, the values returned right now don't match the biolink-model spec for "max research phase"...

If we needed to return values in the biolink-model spec, I imagine we'd need to figure out all current possible values and what they mean, map them to the biolink-model's values, and do a BTE JQ/post-processing step with those mappings.

andrewsu commented 2 months ago

https://mychem.info/v1/query?q=_exists_:chembl.drug_indications&fields=chembl.drug_indications&facets=chembl.drug_indications.max_phase_for_ind

{
    "took": 12,
    "total": 8462,
    "max_score": 1,
    "facets": {
        "chembl.drug_indications.max_phase_for_ind": {
            "_type": "terms",
            "terms": [
                {
                    "count": 4554,
                    "term": 2
                },
                {
                    "count": 3996,
                    "term": 1
                },
                {
                    "count": 3142,
                    "term": 3
                },
                {
                    "count": 2707,
                    "term": 4
                },
                {
                    "count": 692,
                    "term": 0
                },
                {
                    "count": 492,
                    "term": -1
                }
            ],
            "other": 0,
            "missing": 0,
            "total": 15583
        }
    },
colleenXu commented 2 months ago

Mapping:

References

colleenXu commented 2 months ago

@tokebe

I don't think "0.0" actually exists in the data, so let's remove that handling/mapping. I've edited my post above.

There's been discussions happening in Slack w/ Chunlei and Dylan. They discovered an issue with the API that they're addressing, but it won't affect the actual values that we're working with (lab Slack links).

colleenXu commented 2 months ago

@mbrush

In the above post, I wrote mappings between Chembl's "max phase for indication" values (specific to each drug-indication pair) and the biolink-model's MaxResearchPhaseEnum values.

However, I'm not sure on "-1.0", "0.5", and "4.0" - because the actual definition in Chembl doesn't seem to quite match the options.

I'm wondering if you have advice/opinions on this.

colleenXu commented 2 months ago

I've tested the linked PR https://github.com/biothings/bte_trapi_query_graph_handler/pull/192 and it works as-intended!

Example query

``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["MEDDRA:10012374"], "categories":["biolink:Disease"] }, "n1": { "categories":["biolink:SmallMolecule"] } }, "edges": { "e1": { "subject": "n0", "object": "n1", "predicates": ["biolink:tested_by_clinical_trials_of"] } } } } } ```

Here's the before-after for some edges from chembl-treats operations:

max phase for ind = -1.0

Before: ``` "efeef17260b03e60a9266d6d860f2ad1": { "predicate": "biolink:tested_by_clinical_trials_of", "subject": "MONDO:0002050", "object": "CHEBI:135061", "attributes": [ { "attribute_type_id": "biolink:max_research_phase", "value": [ "-1.0" ] }, ``` After: ``` "efeef17260b03e60a9266d6d860f2ad1": { "predicate": "biolink:tested_by_clinical_trials_of", "subject": "MONDO:0002050", "object": "CHEBI:135061", "attributes": [ { "attribute_type_id": "biolink:max_research_phase", "value": [ "not_provided" ] }, ```

max phase for ind = 0.5

Before: ``` "5e9942a90c62660599c90a4e6700fe73": { "predicate": "biolink:tested_by_clinical_trials_of", "subject": "MONDO:0002050", "object": "CHEBI:28939", "attributes": [ { "attribute_type_id": "biolink:max_research_phase", "value": [ "0.5" ] }, ``` After: ``` "5e9942a90c62660599c90a4e6700fe73": { "predicate": "biolink:tested_by_clinical_trials_of", "subject": "MONDO:0002050", "object": "CHEBI:28939", "attributes": [ { "attribute_type_id": "biolink:max_research_phase", "value": [ "pre_clinical_research_phase" ] }, ```

max phase for ind = 1.0

Before: ``` "6082fe0e1b9e1681ce88b2cf94abf3e0": { "predicate": "biolink:tested_by_clinical_trials_of", "subject": "MONDO:0002050", "object": "PUBCHEM.COMPOUND:24802842", "attributes": [ { "attribute_type_id": "biolink:max_research_phase", "value": [ "1.0" ] }, ``` After: ``` "6082fe0e1b9e1681ce88b2cf94abf3e0": { "predicate": "biolink:tested_by_clinical_trials_of", "subject": "MONDO:0002050", "object": "PUBCHEM.COMPOUND:24802842", "attributes": [ { "attribute_type_id": "biolink:max_research_phase", "value": [ "clinical_trial_phase_1" ] }, ```

max phase for ind = 2.0

Before: ``` "9b2dcc4dad93c04c227d80147a766d76": { "predicate": "biolink:tested_by_clinical_trials_of", "subject": "MONDO:0002050", "object": "PUBCHEM.COMPOUND:4118151", "attributes": [ { "attribute_type_id": "biolink:max_research_phase", "value": [ "2.0" ] }, ``` After: ``` "9b2dcc4dad93c04c227d80147a766d76": { "predicate": "biolink:tested_by_clinical_trials_of", "subject": "MONDO:0002050", "object": "PUBCHEM.COMPOUND:4118151", "attributes": [ { "attribute_type_id": "biolink:max_research_phase", "value": [ "clinical_trial_phase_2" ] }, ```

max phase for ind = 3.0 and 4.0 (disease meddra IDs treated as same entity)

Before: ``` "87f29deccfa0f51de58d9ee3f1f98963": { "predicate": "biolink:tested_by_clinical_trials_of", "subject": "MONDO:0002050", "object": "CHEBI:36791", "attributes": [ { "attribute_type_id": "biolink:max_research_phase", "value": [ "4.0", "3.0" ] }, ``` After: ``` "87f29deccfa0f51de58d9ee3f1f98963": { "predicate": "biolink:tested_by_clinical_trials_of", "subject": "MONDO:0002050", "object": "CHEBI:36791", "attributes": [ { "attribute_type_id": "biolink:max_research_phase", "value": [ "clinical_trial_phase_4", "clinical_trial_phase_3" ] }, ```