Closed colleenXu closed 2 months ago
Summary:
biolink:Cell
(cell types) and (2) the biolink:BiologicalProcessOrActivity
association data (includes MolecularActivity, PhysiologicalProcess, PathologicalProcess, Pathway) mostly connects to Genes. Very little connects to other categories like the Disease / Chemical in these use cases. Should go through c-kit, mast cells, and immune cell activation.
I used these IDs using SRI Name Resolver: PUBCHEM.COMPOUND:5291
for imatinib, MONDO:0004979
for asthma.
First, I ran an Explain-query w/o any intermediates (1 QEdge connecting them):
* it runs quickly, only 14 s
* 4 Edges found:
* treats: text-mining targeted
* associated_with: multiomics ehr risk. with qualifiers, the statement is "imatinib is associated with decreased likelihood of asthma"
* has_adverse_event: from automat drugcentral (faers) and mychem drugcentral (likely the same original data)
full response: [imatinib-direct-asthma.json](https://github.com/biothings/biothings_explorer/files/13791320/imatinib-direct-asthma.json)
```
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"name": "imatinib"
},
"n1": {
"ids":["MONDO:0004979"],
"name": "asthma"
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
```
click to see query-graph
Then, I ran an Explain-query w/ 1 intermediate QNode. full response here: [imatinib-inter-asthma-4.json](https://github.com/biothings/biothings_explorer/files/13791475/imatinib-inter-asthma-4.json)
```
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"name": "imatinib"
},
"n1": {
"categories":["biolink:NamedThing"]
},
"n2": {
"ids":["MONDO:0004979"],
"categories":["biolink:DiseaseOrPhenotypicFeature"],
"name": "asthma"
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1",
"predicates": [
"biolink:related_to_at_instance_level",
"biolink:disease_has_location", "biolink:location_of_disease",
"biolink:composed_primarily_of", "biolink:primarily_composed_of",
"biolink:has_chemical_role",
"biolink:has_member", "biolink:member_of"
]
},
"e1": {
"subject": "n2",
"object": "n1",
"predicates": [
"biolink:related_to_at_instance_level",
"biolink:disease_has_location", "biolink:location_of_disease",
"biolink:composed_primarily_of", "biolink:primarily_composed_of",
"biolink:has_chemical_role",
"biolink:has_member", "biolink:member_of"
]
}
}
}
}
}
```
After getting the intersection of intermediate nodes, the final console log for the categories was:
```
bte:biothings-explorer-trapi:QEdge Collected entity ids in records:
["Disease","PhenotypicFeature","PhysiologicalProcess","BiologicalProcess",
"SmallMolecule","ChemicalExposure","Drug","Gene",
"ChemicalEntity","Procedure","MolecularMixture"] +76ms
```
But before then, during the imatinib hop, the categories were:
```
bte:biothings-explorer-trapi:QEdge Collected entity ids in records:
["SmallMolecule","PhysiologicalProcess","Disease","PhenotypicFeature",
"Gene","Protein","PathologicalProcess","Procedure",
"ChemicalEntity","OrganismTaxon","Polypeptide","Cell",
"Phenomenon","Drug","MolecularActivity","DiseaseOrPhenotypicFeature",
"CellularComponent","GrossAnatomicalStructure","AnatomicalEntity","Plant",
"MolecularMixture","ComplexMolecularMixture"] +58ms
```
And during the asthma hop (before the intersecting began), the categories were:
```
bte:biothings-explorer-trapi:QEdge Collected entity ids in records:
["Disease","PhenotypicFeature","Device","ClinicalIntervention",
"Procedure","Event","ClinicalAttribute","PhysiologicalProcess",
"BiologicalProcess","Activity","ComplexMolecularMixture","Gene",
"Protein","ChemicalExposure","SmallMolecule","Drug",
"Publication","InformationContentEntity","PopulationOfIndividualOrganisms","EnvironmentalExposure",
"MolecularMixture","ChemicalEntity","SequenceVariant","OrganismAttribute"] +365ms
```
click to see query-graph
Console logs for intermediate node categories
Should go through BCR-ABL and cell cycle
I used these IDs using SRI Name Resolver: PUBCHEM.COMPOUND:5291
for imatinib, MONDO:0011996
for CML.
First, I ran an Explain-query w/o any intermediates (1 QEdge connecting them):
* it runs quickly, only 13 s
* lots of direct edges, including "treats"
* also some edges to descendants
full response: [imatinib-direct-cml.json](https://github.com/biothings/biothings_explorer/files/13814614/imatinib-direct-cml.json)
```
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"name": "imatinib"
},
"n1": {
"ids":["MONDO:0011996"],
"name": "cml"
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
```
click to see query-graph
Then, I ran an Explain-query w/ 1 intermediate QNode. full response here: [imatinib-inter-cml-2.json](https://github.com/biothings/biothings_explorer/files/13814623/imatinib-inter-cml-2.json)
```
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"name": "imatinib"
},
"n1": {
"categories":["biolink:NamedThing"]
},
"n2": {
"ids":["MONDO:0011996"],
"name": "cml"
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1",
"predicates": [
"biolink:related_to_at_instance_level",
"biolink:disease_has_location", "biolink:location_of_disease",
"biolink:composed_primarily_of", "biolink:primarily_composed_of",
"biolink:has_chemical_role",
"biolink:has_member", "biolink:member_of"
]
},
"e1": {
"subject": "n2",
"object": "n1",
"predicates": [
"biolink:related_to_at_instance_level",
"biolink:disease_has_location", "biolink:location_of_disease",
"biolink:composed_primarily_of", "biolink:primarily_composed_of",
"biolink:has_chemical_role",
"biolink:has_member", "biolink:member_of"
]
}
}
}
}
}
```
After getting the intersection of intermediate nodes, the final console log for the categories was:
```
bte:biothings-explorer-trapi:QEdge Collected entity ids in records:
["Disease","Gene","SmallMolecule","Drug",
"MolecularMixture", "PhenotypicFeature","Procedure","ChemicalEntity",
"Polypeptide","PhysiologicalProcess","Protein","PathologicalProcess",
"AnatomicalEntity","GrossAnatomicalStructure","Cell","MolecularActivity"] +37ms
```
But before then, during the imatinib hop, the categories were (basically the same as for the imatinib-asthma testing)
```
bte:biothings-explorer-trapi:QEdge Collected entity ids in records:
["SmallMolecule","PhenotypicFeature","PhysiologicalProcess","Disease",
"Gene","Protein","PathologicalProcess","Procedure",
"ChemicalEntity","OrganismTaxon","Polypeptide","Cell",
"Phenomenon","Drug","MolecularActivity","DiseaseOrPhenotypicFeature",
"CellularComponent","GrossAnatomicalStructure","AnatomicalEntity","Plant",
"MolecularMixture","ComplexMolecularMixture"] +94ms
```
And during the CML hop (before the intersecting began), the categories were:
```
bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["Disease","Gene","Procedure","SmallMolecule",
"Drug","ChemicalEntity","MolecularMixture","Polypeptide",
"PhenotypicFeature","SequenceVariant","Protein","Pathway",
"CellularComponent","NucleicAcidEntity","PhysiologicalProcess","PathologicalProcess",
"BiologicalEntity","Cohort","OrganismTaxon","Virus",
"Device","GrossAnatomicalStructure","AnatomicalEntity","Cell",
"PopulationOfIndividualOrganisms","MolecularActivity","ComplexMolecularMixture"] +103ms
```
click to see query-graph
Console logs for intermediate node categories
EDIT: after discussion with Andrew 1/8.
I think having no QEdge predicate would also make sense, but our current creative-mode won't run when I don't specify a predicate: I get 0 results and the warning log `bte:biothings-explorer-trapi:inferred-mode Inferred Mode edge must specify a predicate. Your query terminates. +0ms`. This is for imatinib ➡️ CML (Chronic myelogenous leukemia) ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["PUBCHEM.COMPOUND:5291"] }, "n1": { "ids":["MONDO:0011996"] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": ["biolink:related_to"], "knowledge_type": "inferred" } } } } } ```
Notes: * The templates don't set categories for the `creativeQuerySubject` / `creativeQueryObject` because I think it'd be best if BTE plugged in the categories from the NodeNorm ID-lookup. Right now, BTE doesn't seem to be doing this and raises an error (also noted in the next "Issues" section). * I set a predicate list for QEdges to exclude ones that didn't seem useful or may create unhelpful self-edges: superclass_of, subclass_of, broad_match / narrow_match, close_match / exact_match / same_as * I'm assuming that the intermediates are what matters, so it's better for the 1-intermediate template to be startingID1 ➡️ intermediate ⬅️ startingID2 (rather than a one-direction path from startingID1 ➡️ intermediate ➡️ startingID2) #### First template: 0 intermediates ``` { "message": { "query_graph": { "nodes": { "creativeQuerySubject": { }, "creativeQueryObject": { } }, "edges": { "eA": { "subject": "creativeQuerySubject", "object": "creativeQueryObject", "predicates": [ "biolink:related_to_at_instance_level", "biolink:disease_has_location", "biolink:location_of_disease", "biolink:composed_primarily_of", "biolink:primarily_composed_of", "biolink:has_chemical_role", "biolink:has_member", "biolink:member_of" ] } } } } } ``` #### Second template: 1 intermediate ``` { "message": { "query_graph": { "nodes": { "creativeQuerySubject": { }, "nA": { "categories":["biolink:NamedThing"] }, "creativeQueryObject": { } }, "edges": { "eA": { "subject": "creativeQuerySubject", "object": "nA", "predicates": [ "biolink:related_to_at_instance_level", "biolink:disease_has_location", "biolink:location_of_disease", "biolink:composed_primarily_of", "biolink:primarily_composed_of", "biolink:has_chemical_role", "biolink:has_member", "biolink:member_of" ] }, "eB": { "subject": "creativeQueryObject", "object": "nA", "predicates": [ "biolink:related_to_at_instance_level", "biolink:disease_has_location", "biolink:location_of_disease", "biolink:composed_primarily_of", "biolink:primarily_composed_of", "biolink:has_chemical_role", "biolink:has_member", "biolink:member_of" ] } } } } } ```
I think Pathfinder queries should only use Pathfinder templates (when both QNodes have 1 ID and `knowledge_type: inferred`), and vice-versa * The Pathfinder templates wouldn't work for MVPs 1-2 because they're too general * The MVP1-2 templates won't work for Pathfinder because they use `is_set: true` for intermediate nodes, so each result represents a unique "answer" (the QNode at the open end of the Predict-type query). But for Pathfinder/Explain-type where we set both starting QNodes to specific IDs, using `is_set: true` for the intermediate nodes will collapse all the "answers" into 1 giant result - which I assume we don't want.
I'm not sure if this will be a problem if we adjust the template-group matching and test again with a Pathfinder-template-group + queries. But before winter break, I noticed that: * BTE's creative-mode currently doesn't allow > 1 QNode ID total, across all QNodes * Jackson set up a query-handler branch [inferred_explain](https://suwulab.slack.com/archives/D029DAYUKS6/p1703098649420589) that should've removed that check, but [something odd seemed to happen when none of the retrieved edges actually connected the two starting IDs](https://suwulab.slack.com/archives/CC218TEKC/p1703167200721069).
With query-handler's `inferred_explain` branch checked out, I tried setting up a Pathfinder-template-group with the two generic templates I included above. But I encountered issues (also see my "recreating the problems" section below this list):
* I thought I could set the templateGroup's subject / object to `NamedThing` so it'd work no matter what the starting-ID's category was. But when I tried this, BTE wouldn't use the template-group. If I set the subject / object to every biolink category, then BTE would use the template-group.
* But then, BTE has an error `bte:biothings-explorer-trapi:error_handler TypeError: queryGraph.nodes.creativeQuerySubject.categories is not iterable`. I think it's because the generic Pathfinder-templates don't have QNode categories. I intended for BTE to plug in the categories from the NodeNorm ID-lookup...but this doesn't seem to happen here.
* could be mitigated by having different templateGroups for different starting QNode categories and then setting those categories in the templates...but it's kinda redundant having multiple first templates (the 0-intermediate one)
* So the Chem -> DiseaseOrPheno has direct edge, 1 intermediate, and specific ones. Put the starting IDs into the Chem + DoP QNodes so they start w/ those categories, then add the ones from NodeNorm
* I'm not sure if either of these are intended behavior...for example, is there some problem we avoid by not expanding the subject / object categories?
First, check out query-handler's `inferred_explain` branch and replace the contents of query-handler's templateGroups.json file with this (`pnpm build` after!):
```
[
{
"name": "Pathfinder: find paths between two entities",
"subject": ["NamedThing"],
"predicate": ["related_to"],
"object": ["NamedThing"],
"templates": [
"pathfinder-direct.json",
"pathfinder-1intermediate.json"
]
}
]
```
```
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"name": "imatinib"
},
"n1": {
"ids":["MONDO:0011996"],
"name": "cml"
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:related_to"],
"knowledge_type": "inferred"
}
}
}
}
}
```
recreating the problems
Second, query BTE with this Pathfinder-style query
D. Other implementation issues
startingID1 -> descendant1 -> intermediate <- descendant2 <- startingID2
. Data-source note from Andrew: perhaps a Cell marker database (gene <-> cell type and gene <-> tissue) like http://xteam.xbio.top/CellMarker/search.jsp?quickSearchInfo=c-kit would be helpful to add...
@colleenXu Do you have the data from these queries so we can play with it?
@Genomewide
We don't have a full TRAPI response (running all the templates and merging the results into 1 set). The paths in the result sub-graphs may also be too long for the current UI to handle (> 3 edges, 4 nodes?).
You could try working with some of BTE's responses for the individual template-runs (you can ignore the extra notes, that's for our team):
Queries ran locally for prototype presentation: https://docs.google.com/presentation/d/1gFFGJGumtHU_ktHKM2FKauTpC-0bvAh-H_ZI49-qDsI/edit?usp=sharing
all queries set imatinib as ChemicalEntity, disease as DiseaseOrPhenotypicFeature. I'm assuming BTE's templates would set the template placeholder nodes to these categories - which is how our templates/implementation currently work. We could adjust this to have no "template categories" in the future maybe?
(Ran on local instance, main branches + fix-776 branches for workspace/api-response-transform for #776. Also w/o threading or caching.)
[imatinib-inter-asthma-latest.json](https://github.com/biothings/biothings_explorer/files/13985140/imatinib-inter-asthma-latest.json) * 6 min 47 s, 949 results * top results are still KIT, PDGFRA ![Screen Shot 2024-01-18 at 10 27 55 PM](https://github.com/biothings/biothings_explorer/assets/43731687/a28e6181-0159-4f39-98e0-7a1fbdb170da)
[imatinib-inter-cml-latest.json](https://github.com/biothings/biothings_explorer/files/13985238/imatinib-inter-cml-latest.json) * 5 min 37 s, 1546 results * top results are still BCR, ABL1 ![Screen Shot 2024-01-18 at 10 38 44 PM](https://github.com/biothings/biothings_explorer/assets/43731687/fab3f256-6e06-4d27-8cf6-bee6f2741814)
See previous post for imatinib → Gene → Cell ← asthma
[imatinib-gene-cell-cml.json](https://github.com/biothings/biothings_explorer/files/13985481/imatinib-gene-cell-cml.json) * 1 min 45 s, 1419 results * 15 unique entities: interesting ones are Hematopoietic stem cells, Blast Cell, Bone Marrow Cells, granulocyte, Pluripotent Stem Cells * others: cultured cell line, t-lymphocyte, stem cells, Lymphocyte, Neoplastic Cell, Clone Cells, K-562, Leukemic Cell, lymphoblast, Blood Cells * results w/ gene BCR: * BCR → blast cell is result 8 (also connected to hematopoietic stem cells in result 215, bone marrow cells in result 642) * fusion proteins, bcr-abl → hematopoietic stem cells is result 206 * results w/ gene ABL1: * ABL1 → hematopoietic stem cells in result 180 (bone marrow cells in result 606)
[imatinib-gene-physiopath-asthma.json](https://github.com/biothings/biothings_explorer/files/13986041/imatinib-gene-physiopath-asthma.json) * 6 min 35 s, 1254 results * Doing this because * BiologicalProcess takes too long to run (>13 min w/ 21 unique intermediates) - these are the most promising children terms * most interesting is "IgE responsiveness, atopic" * KIT connected to edema (HP:0000969) and cardiac rhythm disease (MONDO:0007263), anaphylaxis (MONDO:0100053), respiratory arrest (HP:0005943) * also, MolecularActivity wasn't interesting (see previous post) * no exact matches for "immune cell activation", but some stuff is close * no pathways found * 38 physiologicalprocess terms, most were generic. Some interesting ones were: * immune response: results 166-195 * bronchoconstriction: result 234 * histamine release * t-cell activation * neutrophil infiltration * immune cell processes * host defense * antiviral response * cytokine production: results 236 - 245 * Negative Regulation of Inflammatory Response Process
[imatinib-gene-physiopath-cml.json](https://github.com/biothings/biothings_explorer/files/13986274/imatinib-gene-physiopath-cml.json) * 3 min 35 s, 3398 results * Doing this because BiologicalActivity would probably take too long to run * no exact matches for "cell cycle", but some stuff is close * 29 Pathways found! some interesting ones: * Cyclin D associated events in G1 (Homo sapiens) - reactome * pathways in cancer - bioplanet * Inhibition of cellular proliferation by Gleevec - bioplanet * Chronic myeloid leukemia - bioplanet * 12 physiologicalprocess terms. Some interesting ones: * cell proliferation: results 8-282. BCR in 102, ABL1 in 279. * lymphocyte activation (results 1-7) * mitotic metaphase * negative regulation of g2 phase
I did not respond to one of your previous comments, but 3 edges is fine. That is the max though. I look forward to seeing this!
Closing, pathfinder efforts are now in https://github.com/biothings/biothings_explorer/issues/794
For the Jan 2024 Relay, Translator teams are supposed to bring prototypes for the next-creative-mode choices: Pathfinder and multi-curie. Our team has chosen to focus on Pathfinder.
This issue is for discussing / working on this prototype.
Current assumptions: