biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://api.bte.ncats.io
Apache License 2.0
8 stars 9 forks source link

filter on node attributes #174

Closed andrewsu closed 2 years ago

andrewsu commented 3 years ago

Node attributes on chemical compounds indicate drug approval status. Clearly we could filter based on these attributes post-query, and a helper tool to do this may be quite useful in the short term. But it would also be good to allow filtering on these node attributes as part of the query. Filters would presumably be based on EITHER the value of a certain attribute OR the existence of an attribute.

More details

This query gives all compounds related to SLC2A1 / GLUT1 (Updated to TRAPI v1.1 2021-06-08):

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Gene"],
                    "ids": ["NCBIGene:6513"]
                },
                "n1": {
                    "categories": ["biolink:ChemicalSubstance"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicate": "biolink:physically_interacts_with"
                }
            }
        }
    }
}

one of the results is pasted below -- several attributes seem to indicate FDA approval status, including chembl_max_phase, drugbank_groups, fda_epc_pharmacology_class

               "CHEBI:40279": {
                    "category": "biolink:ChemicalSubstance",
                    "name": "ALLOPURINOL",
                    "attributes": [
                        {
                            "name": "equivalent_identifiers",
                            "value": [
                                "CHEMBL.COMPOUND:CHEMBL1467",
                                "CHEMBL.COMPOUND:CHEMBL1200477",
                                "DRUGBANK:DB00437",
                                "PUBCHEM.COMPOUND:135401907",
                                "PUBCHEM.COMPOUND:2094",
                                "CHEBI:40279",
                                "UMLS:C0002144",
                                "MESH:D000493",
                                "UNII:428673RC2Z",
                                "UNII:63CZ7GJN5I",
                                "INCHIKEY:OFCNXPDARWKPPY-UHFFFAOYSA-N",
                                "INCHI:InChI=1S/C5H4N4O/c10-5-3-1-8-9-4(3)6-2-7-5/h1-2H,(H2,6,7,8,9,10)",
                                "LINCS:LSM-5919",
                                "name:ALLOPURINOL",
                                "name:Allopurinol",
                                "name:allopurinol"
                            ],
                            "type": "biolink:id"
                        },
                        {
                            "name": "num_source_nodes",
                            "value": 1,
                            "type": "bts:num_source_nodes"
                        },
                        {
                            "name": "num_target_nodes",
                            "value": 0,
                            "type": "bts:num_target_nodes"
                        },
                        {
                            "name": "source_qg_nodes",
                            "value": [
                                "n0"
                            ],
                            "type": "bts:source_qg_nodes"
                        },
                        {
                            "name": "target_qg_nodes",
                            "value": [],
                            "type": "bts:target_qg_nodes"
                        },
                        {
                            "name": "chembl_max_phase",
                            "value": [
                                "4"
                            ],
                            "type": "bts:chembl_max_phase"
                        },
                        {
                            "name": "chembl_molecule_type",
                            "value": [
                                "Small molecule"
                            ],
                            "type": "bts:chembl_molecule_type"
                        },
                        {
                            "name": "drugbank_drug_category",
                            "value": [
                                "Antigout Preparations",
                                "Antimetabolites",
                                "Antioxidants",
                                "BCRP/ABCG2 Substrates",
                                "Drugs that are Mainly Renally Excreted",
                                "Enzyme Inhibitors",
                                "Free Radical Scavengers",
                                "Heterocyclic Compounds, Fused-Ring",
                                "Musculo-Skeletal System",
                                "OAT3/SLC22A8 Substrates",
                                "Preparations Inhibiting Uric Acid Production",
                                "Protective Agents",
                                "Purines",
                                "Uricosuric Agents",
                                "Xanthine Oxidase Inhibitors"
                            ],
                            "type": "bts:drugbank_drug_category"
                        },
                        {
                            "name": "drugbank_taxonomy_class",
                            "value": [
                                "Pyrazolopyrimidines"
                            ],
                            "type": "bts:drugbank_taxonomy_class"
                        },
                        {
                            "name": "drugbank_groups",
                            "value": [
                                "approved"
                            ],
                            "type": "bts:drugbank_groups"
                        },
                        {
                            "name": "drugbank_kingdom",
                            "value": [
                                "Organic compounds"
                            ],
                            "type": "bts:drugbank_kingdom"
                        },
                        {
                            "name": "drugbank_superclass",
                            "value": [
                                "Organoheterocyclic compounds"
                            ],
                            "type": "bts:drugbank_superclass"
                        },
                        {
                            "name": "contraindications",
                            "value": [
                                "Disease of liver",
                                "Chronic heart failure",
                                "Hypersensitivity angiitis",
                                "Dehydration",
                                "Impaired renal function disorder"
                            ],
                            "type": "bts:contraindications"
                        },
                        {
                            "name": "indications",
                            "value": [
                                "Calcium renal calculus",
                                "Gout",
                                "Uric Acid Nephropathy Gout",
                                "Articular gout",
                                "Hyperuricemia",
                                "Chemotherapy-Induced Hyperuricemia",
                                "Uric acid renal calculus",
                                "Recurrent Calcium Renal Calculi"
                            ],
                            "type": "bts:indications"
                        },
                        {
                            "name": "mesh_pharmacology_class",
                            "value": [
                                "Antimetabolites",
                                "Antioxidants",
                                "Antirheumatic Agents",
                                "Enzyme Inhibitors",
                                "Free Radical Scavengers",
                                "Gout Suppressants",
                                "Noxae"
                            ],
                            "type": "bts:mesh_pharmacology_class"
                        },
                        {
                            "name": "fda_epc_pharmacology_class",
                            "value": [
                                "Xanthine Oxidase Inhibitor"
                            ],
                            "type": "bts:fda_epc_pharmacology_class"
                        }
                    ]
andrewsu commented 3 years ago

In TRAPI 1.1, message.query_graph is of type QueryGraph (link) which has properties for nodes (of type QNode) and edges (of type QEdge). Both QNodes and QEdges allows the specification of a constraint property (of type QueryConstraint link). The constraint described above (filter for FDA-approved drugs) would look like this (taken from https://github.com/NCATSTranslator/minihackathons/blob/main/2021-12_demo/workflowA/EGFR_advanced.json and referenced in https://github.com/NCATSTranslator/minihackathons/issues/164):

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": [
                        "biolink:ChemicalSubstance"
                    ],
                    "name": "Chemical Substance",
                    "constraints": [
                        {
                            "id": "biolink:highest_FDA_approval_status",
                            "name": "highest FDA approval status",
                            "operator": "==",
                            "value": "regular approval"
                        }
                    ]
                },
                "n1": {
                    "name": "EGFR",
                    "ids": [
                        "NCBIGene:1956"
                    ]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": [
                        "biolink:decreases_abundance_of",
                        "biolink:decreases_activity_of",
                        "biolink:decreases_expression_of",
                        "biolink:decreases_synthesis_of",
                        "biolink:increases_degradation_of",
                        "biolink:disrupts",
                        "biolink:entity_negatively_regulates_entity"
                    ]
                }
            }
        }
    }
}

Based on that, it appears we need to change bts:chembl_max_phase to biolink:highest_FDA_approval_status in our smartAPI mapping file?

colleenXu commented 2 years ago

update for TRAPI v1.2. I think the query constraints work the same as specified above. I updated the query below (maybe use SmallMolecule or ChemicalEntity?):

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": [
                        "biolink:ChemicalEntity"
                    ],
                    "constraints": [
                        {
                            "id": "biolink:highest_FDA_approval_status",
                            "name": "highest FDA approval status",
                            "operator": "==",
                            "value": "regular approval"
                        }
                    ]
                },
                "n1": {
                    "categories": [
                        "biolink:Gene"
                    ],
                    "ids": [
                        "NCBIGene:1956"
                    ]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}
colleenXu commented 2 years ago

This is dependent on #302 and its PRs. I think it's addressed by this PR: https://github.com/biothings/bte_trapi_query_graph_handler/pull/45

colleenXu commented 2 years ago

@marcodarko some feedback on https://github.com/biothings/bte_trapi_query_graph_handler/pull/45:

I only took a quick look based on this query, which worked as-expected compared to its unconstrained version:

Click for query ``` { "message": { "query_graph": { "edges": { "e01": { "subject": "n0", "object": "n1" } }, "nodes": { "n0": { "ids": ["MONDO:0019391"], "categories": ["biolink:Disease"] }, "n1": { "categories": ["biolink:Gene"], "constraints": [ { "name": "type_of_gene", "operator": "==", "value": "protein-coding" } ] } } } } } ```

Also, there is some complicated and unclear behavior in the specification regarding the operators and values...I think this involves some discussion with @andrewsu...

andrewsu commented 2 years ago

Example query: https://github.com/NCATSTranslator/minihackathons/blob/main/2021-12_demo/workflowA/future/A.2_RHOBTB2_twohop_constrained.json

andrewsu commented 2 years ago

For now, let's only deal with single-value constraints. We also need to deal with mismatch between chembl.max_phase in mychem.info and FDA_approval_status_enum at https://github.com/biolink/biolink-model/blob/master/biolink-model.yaml#L239. So instead of the example query posted immediately above, for now, test on this (replacing "regular approval" with "4"):

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "object": "n0",
                    "subject": "n1",
                    "predicates": [
                        "biolink:entity_regulates_entity",
                        "biolink:genetically_interacts_with"
                    ]
                },
                "e02": {
                    "object": "n1",
                    "subject": "n2",
                    "predicates": [
                        "biolink:related_to"
                    ]
                }
            },
            "nodes": {
                "n0": {
                    "ids": [
                        "NCBIGene:23221"
                    ],
                    "categories": [
                        "biolink:Gene"
                    ]
                },
                "n1": {
                    "categories": [
                        "biolink:Gene"
                    ]
                },
                "n2": {
                    "categories": [
                        "biolink:SmallMolecule"
                    ] ,
                    "constraints": [
                        {
                            "id": "biolink:highest_FDA_approval_status",
                            "name": "highest FDA approval status",
                            "operator": "==",
                            "value": "4"
                        }
                    ]
                }
            }
        }
    }
}
colleenXu commented 2 years ago

Note that the following queries are working as-expected. However, I am currently using a local environ that excludes all pending biothings APIs except clinical risk kp api / multiomics wellness api...

Modified version of A.2_RHOBTB2_twohop_constrained Modified from [this](https://github.com/NCATSTranslator/minihackathons/blob/main/2021-12_demo/workflowA/future/A.2_RHOBTB2_twohop_constrained.json) ``` { "message": { "query_graph": { "edges": { "e01": { "subject": "n1", "object": "n0", "predicates": [ "biolink:entity_regulates_entity", "biolink:genetically_interacts_with" ] }, "e02": { "subject": "n2", "object": "n1", "predicates": ["biolink:related_to"] } }, "nodes": { "n0": { "ids": ["NCBIGene:23221"], "categories": ["biolink:Gene"] }, "n1": { "categories": ["biolink:Gene"] }, "n2": { "categories": ["biolink:SmallMolecule"] , "constraints": [ { "id": "biolink:highest_FDA_approval_status", "name": "highest FDA approval status", "operator": "==", "value": 4 } ] } } } } } ```
Modified version of A.9_EGFR_advanced Modified from [this](https://github.com/NCATSTranslator/minihackathons/blob/main/2021-12_demo/workflowA/backup/A.9_EGFR_advanced.json) ``` { "message": { "query_graph": { "nodes": { "n0": { "categories": ["biolink:SmallMolecule"], "name": "Small Molecule", "constraints": [ { "id": "biolink:highest_FDA_approval_status", "name": "highest FDA approval status", "operator": "==", "value": 4 } ] }, "n1": { "name": "EGFR", "ids": ["NCBIGene:1956"] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": [ "biolink:decreases_abundance_of", "biolink:decreases_activity_of", "biolink:decreases_expression_of", "biolink:decreases_synthesis_of", "biolink:increases_degradation_of", "biolink:disrupts", "biolink:entity_negatively_regulates_entity" ] } } } } } ```

With the removal of pending apis, we currently cannot do the following query (no metaKG edges to do the operation with negatively regulates...

Modified version of A.2a_expanded_RHOBTB2_twohop_constrained Modified from [this](https://github.com/NCATSTranslator/minihackathons/blob/main/2021-12_demo/workflowA/future/A.2a_expanded_RHOBTB2_twohop_constrained.json) ``` { "message": { "query_graph": { "edges": { "e01": { "object": "n0", "subject": "n1", "predicates": [ "biolink:entity_negatively_regulates_entity" ] }, "e02": { "object": "n1", "subject": "n2", "predicates": [ "biolink:increases_abundance_of", "biolink:increases_expression_of", "biolink:increases_stability_of", "biolink:increases_uptake_of", "biolink:decreases_degradation_of", "biolink:increases_secretion_of", "biolink:increases_metabolic_processing_of", "biolink:increases_folding_of", "biolink:increases_localization_of", "biolink:increases_synthesis_of", "biolink:increases_response_to", "biolink:increases_splicing_of", "biolink:increases_mutation_rate_of", "biolink:increases_transport_of", "biolink:increases_activity_of", "biolink:increases_molecular_modification_of", "biolink:increases_molecular_interaction" ] } }, "nodes": { "n0": { "ids": [ "NCBIGene:23221" ], "categories": [ "biolink:Gene" ] }, "n1": { "categories": [ "biolink:Gene" ] }, "n2": { "categories": [ "biolink:SmallMolecule" ], "constraints": [ { "id": "biolink:highest_FDA_approval_status", "name": "highest FDA approval status", "operator": "==", "value": 4 } ] } } } } } ```
colleenXu commented 2 years ago

@marcodarko note that we may want to change the node attribute name from biolink:"highest_FDA_approval_status" to "biolink:"drug_regulatory_status_world_wide".

@andrewsu please confirm whether we want to make this change or not...

colleenXu commented 2 years ago

@marcodarko also FYI the discussion here https://github.com/NCATSTranslator/ReasonerAPI/issues/298. It's difficult to know how to implement constraints when the node attribute's value is actually a list (not an int or string)....which is the case for most of the node attributes described above.

andrewsu commented 2 years ago

@marcodarko note that we may want to change the node attribute name from biolink:"highest_FDA_approval_status" to "biolink:"drug_regulatory_status_world_wide".

@andrewsu please confirm whether we want to make this change or not...

Yes, the mapping of chembl.max_phase should now go to biolink:drug_regulatory_status_world_wide

colleenXu commented 2 years ago

Just tried again with a full api list and the following queries ran and looked like they worked compared to the queries without the constraint:

marcodarko commented 2 years ago

@colleenXu @andrewsu great thank you both, I'll update the PRs

colleenXu commented 2 years ago

Updating the queries to use biolink:drug_regulatory_status_world_wide:

Modified version of A.2_RHOBTB2_twohop_constrained Modified from [this](https://github.com/NCATSTranslator/minihackathons/blob/main/2021-12_demo/workflowA/future/A.2_RHOBTB2_twohop_constrained.json) ``` { "message": { "query_graph": { "edges": { "e01": { "subject": "n1", "object": "n0", "predicates": [ "biolink:entity_regulates_entity", "biolink:genetically_interacts_with" ] }, "e02": { "subject": "n2", "object": "n1", "predicates": ["biolink:related_to"] } }, "nodes": { "n0": { "ids": ["NCBIGene:23221"], "categories": ["biolink:Gene"] }, "n1": { "categories": ["biolink:Gene"] }, "n2": { "categories": ["biolink:SmallMolecule"] , "constraints": [ { "id": "biolink:drug_regulatory_status_world_wide", "name": "max phase", "operator": "==", "value": 4 } ] } } } } } ```
Modified version of A.9_EGFR_advanced Modified from [this](https://github.com/NCATSTranslator/minihackathons/blob/main/2021-12_demo/workflowA/backup/A.9_EGFR_advanced.json) ``` { "message": { "query_graph": { "nodes": { "n0": { "categories": ["biolink:SmallMolecule"], "name": "Small Molecule", "constraints": [ { "id": "biolink:drug_regulatory_status_world_wide", "name": "max phase", "operator": "==", "value": 4 } ] }, "n1": { "name": "EGFR", "ids": ["NCBIGene:1956"] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": [ "biolink:decreases_abundance_of", "biolink:decreases_activity_of", "biolink:decreases_expression_of", "biolink:decreases_synthesis_of", "biolink:increases_degradation_of", "biolink:disrupts", "biolink:entity_negatively_regulates_entity" ] } } } } } ```

With the removal of pending apis, we currently cannot do the following query (no metaKG edges to do the operation with negatively regulates...

Modified version of A.2a_expanded_RHOBTB2_twohop_constrained Modified from [this](https://github.com/NCATSTranslator/minihackathons/blob/main/2021-12_demo/workflowA/future/A.2a_expanded_RHOBTB2_twohop_constrained.json) ``` { "message": { "query_graph": { "edges": { "e01": { "object": "n0", "subject": "n1", "predicates": [ "biolink:entity_negatively_regulates_entity" ] }, "e02": { "object": "n1", "subject": "n2", "predicates": [ "biolink:increases_abundance_of", "biolink:increases_expression_of", "biolink:increases_stability_of", "biolink:increases_uptake_of", "biolink:decreases_degradation_of", "biolink:increases_secretion_of", "biolink:increases_metabolic_processing_of", "biolink:increases_folding_of", "biolink:increases_localization_of", "biolink:increases_synthesis_of", "biolink:increases_response_to", "biolink:increases_splicing_of", "biolink:increases_mutation_rate_of", "biolink:increases_transport_of", "biolink:increases_activity_of", "biolink:increases_molecular_modification_of", "biolink:increases_molecular_interaction" ] } }, "nodes": { "n0": { "ids": [ "NCBIGene:23221" ], "categories": [ "biolink:Gene" ] }, "n1": { "categories": [ "biolink:Gene" ] }, "n2": { "categories": [ "biolink:SmallMolecule" ], "constraints": [ { "id": "biolink:drug_regulatory_status_world_wide", "name": "max phase", "operator": "==", "value": 4 } ] } } } } } ```
colleenXu commented 2 years ago

Closing because it's been deployed on prod.

Next steps are @andrewsu discussing whether the demo queries can use our node constraint modifications: https://github.com/biothings/BioThings_Explorer_TRAPI/issues/174#issuecomment-954972343

colleenXu commented 2 years ago

noting that only == was implemented (so I can't do <= or >=)... #380

colleenXu commented 2 years ago

I'm not sure if this is still working...the following query is hitting a TypeError:

Query:

{
    "message": {
        "query_graph": {
            "nodes": {
                "disease": {
                    "ids":["MONDO:0004975"],
                    "categories":["biolink:Disease"],
                    "names": "Alzheimers"
               },
                "nA": {
                    "categories":["biolink:PhenotypicFeature"]
                },
                "drug": {
                    "categories":["biolink:ChemicalEntity"],
                    "constraints": [
                        {
                            "id": "biolink:drug_regulatory_status_world_wide",
                            "name": "max phase",
                            "operator": "==",
                            "value": 2
                        }
                    ]
                }
            },
            "edges": {
                "eA": {
                    "subject": "disease",
                    "object": "nA",
                    "predicates": ["biolink:has_phenotype"]
                },
                "eB": {
                    "subject": "nA",
                    "object": "drug",
                    "predicates": ["biolink:treated_by"]
                }
            }
        }
    }
}

Console logs:

  bte:call-apis:query id annotation completes +3s
  bte:call-apis:query qEdge queries complete in 6s +0ms
  bte:biothings-explorer-trapi:batch_edge_query APIEdges are successfully queried.... +9s
  bte:biothings-explorer-trapi:batch_edge_query Filtering out any "undefined" items in (1555) records +0ms
  bte:biothings-explorer-trapi:batch_edge_query Total number of records is (1496) +0ms
  bte:biothings-explorer-trapi:batch_edge_query Start to update nodes... +0ms
  bte:biothings-explorer-trapi:batch_edge_query Update nodes completed! +0ms
  bte:biothings-explorer-trapi:QueryExecutionEdge (6) Storing records... +9s
  bte:biothings-explorer-trapi:QueryExecutionEdge (6) Applying Node Constraints to 1496 records. +0ms
  bte:biothings-explorer-trapi:QueryExecutionEdge Node (object) constraints: [{"id":"biolink:drug_regulatory_status_world_wide","name":"max phase","operator":"==","value":2}] +0ms
  bte:biothings-explorer-trapi:error_handler TypeError: Cannot read properties of undefined (reading '0')
  bte:biothings-explorer-trapi:error_handler     at QueryExecutionEdge.meetsConstraint (/Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/query_execution_edge.js:294:43)
  bte:biothings-explorer-trapi:error_handler     at QueryExecutionEdge.applyNodeConstraints (/Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/query_execution_edge.js:274:33)
  bte:biothings-explorer-trapi:error_handler     at QueryExecutionEdge.storeRecords (/Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/query_execution_edge.js:426:14)
  bte:biothings-explorer-trapi:error_handler     at TRAPIQueryHandler.query (/Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/query_graph_handler/built/index.js:245:27)
  bte:biothings-explorer-trapi:error_handler     at runMicrotasks (<anonymous>)
  bte:biothings-explorer-trapi:error_handler     at processTicksAndRejections (node:internal/process/task_queues:96:5)
  bte:biothings-explorer-trapi:error_handler     at async task (/Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/bte-trapi/src/routes/v1/query_v1.js:34:13)
  bte:biothings-explorer-trapi:error_handler     at async /Users/colleenxu/Desktop/bte-trapi-workspace/packages/@biothings-explorer/bte-trapi/src/controllers/threading/threadHandler.js:93:38 +3m