biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://api.bte.ncats.io
Apache License 2.0
8 stars 9 forks source link

creative-mode "Explain-style" prototype for Pathfinder option (Translator Jan 2024 Relay) #771

Closed colleenXu closed 2 months ago

colleenXu commented 6 months ago

For the Jan 2024 Relay, Translator teams are supposed to bring prototypes for the next-creative-mode choices: Pathfinder and multi-curie. Our team has chosen to focus on Pathfinder.

This issue is for discussing / working on this prototype.


Current assumptions:

colleenXu commented 6 months ago

Running Explain-type queries for the imatinib use cases

Summary:

imatinib ➡️ asthma

Should go through c-kit, mast cells, and immune cell activation.

I used these IDs using SRI Name Resolver: PUBCHEM.COMPOUND:5291 for imatinib, MONDO:0004979 for asthma.

There are direct edges for imatinib ➡️ asthma

First, I ran an Explain-query w/o any intermediates (1 QEdge connecting them): * it runs quickly, only 14 s * 4 Edges found: * treats: text-mining targeted * associated_with: multiomics ehr risk. with qualifiers, the statement is "imatinib is associated with decreased likelihood of asthma" * has_adverse_event: from automat drugcentral (faers) and mychem drugcentral (likely the same original data) full response: [imatinib-direct-asthma.json](https://github.com/biothings/biothings_explorer/files/13791320/imatinib-direct-asthma.json)

click to see query-graph

``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["PUBCHEM.COMPOUND:5291"], "name": "imatinib" }, "n1": { "ids":["MONDO:0004979"], "name": "asthma" } }, "edges": { "e0": { "subject": "n0", "object": "n1" } } } } } ```

imatinib ➡️ 1 intermediate ⬅️ asthma

Then, I ran an Explain-query w/ 1 intermediate QNode. full response here: [imatinib-inter-asthma-4.json](https://github.com/biothings/biothings_explorer/files/13791475/imatinib-inter-asthma-4.json)

click to see query-graph

``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["PUBCHEM.COMPOUND:5291"], "name": "imatinib" }, "n1": { "categories":["biolink:NamedThing"] }, "n2": { "ids":["MONDO:0004979"], "categories":["biolink:DiseaseOrPhenotypicFeature"], "name": "asthma" } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": [ "biolink:related_to_at_instance_level", "biolink:disease_has_location", "biolink:location_of_disease", "biolink:composed_primarily_of", "biolink:primarily_composed_of", "biolink:has_chemical_role", "biolink:has_member", "biolink:member_of" ] }, "e1": { "subject": "n2", "object": "n1", "predicates": [ "biolink:related_to_at_instance_level", "biolink:disease_has_location", "biolink:location_of_disease", "biolink:composed_primarily_of", "biolink:primarily_composed_of", "biolink:has_chemical_role", "biolink:has_member", "biolink:member_of" ] } } } } } ```

### Overall * it took **7 min** to run locally (w/o threading or caching) * **718 results** * Big Caveat: I also set the QNode with the asthma ID to the category `DiseaseOrPhenotypicFeature`. If I removed this category so BTE solely uses the top category from SRI NodeNorm, then KIT isn't in the results (see this response: [imatinib-inter-asthma-3.json](https://github.com/biothings/biothings_explorer/files/13791987/imatinib-inter-asthma-3.json)). * I suspect that the asthma - KIT edge from biolink-api/monarch is retrieved by the PhenotypicFeature (HP ID) -> Gene operations, not the Disease (MONDO ID) -> Gene operations. * Note: I set a predicate list for both QEdges to exclude ones that didn't seem useful or may create unhelpful self-edges: superclass_of, subclass_of, broad_match / narrow_match, close_match / exact_match / same_as ### Expected results * KIT (Gene) is the top result! Note: only 1 edge for asthma ➡️ KIT from biolink-api (monarch) VS lots of imatinib ➡️ KIT edges from multiple sources * "mast cells": no exact match. There were some intermediate diseases that seemed related: * systemic mastocytosis - rank 21 * mastocytosis - rank 34 * aggressive systemic mastocytosis - rank 674 * "immune cell activation": no exact match (I was looking for a related process / activity / pathway). But there are disease and gene intermediates that seem to be related to the immune system and to cell proliferation. * Genes PDGFRA (rank 2) and PDGFB (rank 55) seem related to [asthma, immune regulation, and imatinib](https://pubmed.ncbi.nlm.nih.gov/32116722/) * immune system disorder intermediates like idiopathic hypereosinophilic syndrome (rank 33), eosinophilic pneumonia (rank 138) ### intermediate node categories analysis I searched the response using the console logs for the intermediate node categories, and got this list: * Disease, PhenotypicFeature * Gene * Chem: SmallMolecule, ChemicalEntity, MolecularMixture * 1 PhysiologicalProcess, pregnancy (imatinib contraindicated_for pregnancy and asthma correlated_with pregnancy) * 1 Procedure, liver transplantation (edges from multiomics ehr risk) Interestingly, Pathway didn't show up at all - BiologicalProcess, MolecularActivity, PhysiologicalProcess, PathologicalProcess all did, before the intersecting of intermediate nodes.
Console logs for intermediate node categories

After getting the intersection of intermediate nodes, the final console log for the categories was: ``` bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["Disease","PhenotypicFeature","PhysiologicalProcess","BiologicalProcess", "SmallMolecule","ChemicalExposure","Drug","Gene", "ChemicalEntity","Procedure","MolecularMixture"] +76ms ``` But before then, during the imatinib hop, the categories were: ``` bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["SmallMolecule","PhysiologicalProcess","Disease","PhenotypicFeature", "Gene","Protein","PathologicalProcess","Procedure", "ChemicalEntity","OrganismTaxon","Polypeptide","Cell", "Phenomenon","Drug","MolecularActivity","DiseaseOrPhenotypicFeature", "CellularComponent","GrossAnatomicalStructure","AnatomicalEntity","Plant", "MolecularMixture","ComplexMolecularMixture"] +58ms ``` And during the asthma hop (before the intersecting began), the categories were: ``` bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["Disease","PhenotypicFeature","Device","ClinicalIntervention", "Procedure","Event","ClinicalAttribute","PhysiologicalProcess", "BiologicalProcess","Activity","ComplexMolecularMixture","Gene", "Protein","ChemicalExposure","SmallMolecule","Drug", "Publication","InformationContentEntity","PopulationOfIndividualOrganisms","EnvironmentalExposure", "MolecularMixture","ChemicalEntity","SequenceVariant","OrganismAttribute"] +365ms ```

imatinib ➡️ CML (Chronic myelogenous leukemia)

Should go through BCR-ABL and cell cycle

I used these IDs using SRI Name Resolver: PUBCHEM.COMPOUND:5291 for imatinib, MONDO:0011996 for CML.

There are direct edges for imatinib ➡️ CML and its descendants

First, I ran an Explain-query w/o any intermediates (1 QEdge connecting them): * it runs quickly, only 13 s * lots of direct edges, including "treats" * also some edges to descendants full response: [imatinib-direct-cml.json](https://github.com/biothings/biothings_explorer/files/13814614/imatinib-direct-cml.json)

click to see query-graph

``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["PUBCHEM.COMPOUND:5291"], "name": "imatinib" }, "n1": { "ids":["MONDO:0011996"], "name": "cml" } }, "edges": { "e0": { "subject": "n0", "object": "n1" } } } } } ```

imatinib ➡️ 1 intermediate ⬅️ CML

Then, I ran an Explain-query w/ 1 intermediate QNode. full response here: [imatinib-inter-cml-2.json](https://github.com/biothings/biothings_explorer/files/13814623/imatinib-inter-cml-2.json)

click to see query-graph

``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["PUBCHEM.COMPOUND:5291"], "name": "imatinib" }, "n1": { "categories":["biolink:NamedThing"] }, "n2": { "ids":["MONDO:0011996"], "name": "cml" } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": [ "biolink:related_to_at_instance_level", "biolink:disease_has_location", "biolink:location_of_disease", "biolink:composed_primarily_of", "biolink:primarily_composed_of", "biolink:has_chemical_role", "biolink:has_member", "biolink:member_of" ] }, "e1": { "subject": "n2", "object": "n1", "predicates": [ "biolink:related_to_at_instance_level", "biolink:disease_has_location", "biolink:location_of_disease", "biolink:composed_primarily_of", "biolink:primarily_composed_of", "biolink:has_chemical_role", "biolink:has_member", "biolink:member_of" ] } } } } } ```

### Overall * it took **6 min 46 s** to run locally (w/o threading or caching) * **1661 results** * Note: I set a predicate list for both QEdges to exclude ones that didn't seem useful or may create unhelpful self-edges: superclass_of, subclass_of, broad_match / narrow_match, close_match / exact_match / same_as ### Expected results * BCR-ABL: separately, BCR and ABL1 are the top gene results (rank 2 and 3), with lots of edges to imatinib and to CML. * There's also related results like "Fusion Proteins, bcr-abl" (rank 770, umls id, from biothings semmeddb) and "Tyrosine-protein kinase ABL1 (ABL)" (not in top 1000, TTD.TARGET id, from biothings ttd) * "cell cycle": no exact match. * There may be Gene intermediates related to the cell cycle, like CCND1 * There were some PhysiologicalProcess intermediates that seem related but they're general concepts, from biothings semmeddb only, and didn't score highly (not in top 1000 results unless otherwise noted): * cell proliferation (rank 992) * autophagy (rank 994) * apoptosis (rank 998) * growth * cell growth * cell survival * signal transduction * cell death ### intermediate node categories analysis I searched the response using the console logs for the intermediate node categories, and got this list: * Disease, PhenotypicFeature * Gene, Protein * Chem: SmallMolecule, Drug, MolecularMixture, ChemicalEntity * PhysiologicalProcess * Procedure * Cell (but entities and edges aren't helpful or interesting. Ex: both imatinib and CML are located in "bone marrow cells", edges from biothings semmeddb) * 1 AnatomicalEntity: blood (both imatinib and CML are located in the blood, edges from biothings semmeddb) * 1 MolecularActivity: "Down-Regulation" but the edges weren't helpful - imatinib causes down-regulation and CML includes down-regulation (edges from biothings semmeddb)
Console logs for intermediate node categories

After getting the intersection of intermediate nodes, the final console log for the categories was: ``` bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["Disease","Gene","SmallMolecule","Drug", "MolecularMixture", "PhenotypicFeature","Procedure","ChemicalEntity", "Polypeptide","PhysiologicalProcess","Protein","PathologicalProcess", "AnatomicalEntity","GrossAnatomicalStructure","Cell","MolecularActivity"] +37ms ``` But before then, during the imatinib hop, the categories were (basically the same as for the imatinib-asthma testing) ``` bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["SmallMolecule","PhenotypicFeature","PhysiologicalProcess","Disease", "Gene","Protein","PathologicalProcess","Procedure", "ChemicalEntity","OrganismTaxon","Polypeptide","Cell", "Phenomenon","Drug","MolecularActivity","DiseaseOrPhenotypicFeature", "CellularComponent","GrossAnatomicalStructure","AnatomicalEntity","Plant", "MolecularMixture","ComplexMolecularMixture"] +94ms ``` And during the CML hop (before the intersecting began), the categories were: ``` bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["Disease","Gene","Procedure","SmallMolecule", "Drug","ChemicalEntity","MolecularMixture","Polypeptide", "PhenotypicFeature","SequenceVariant","Protein","Pathway", "CellularComponent","NucleicAcidEntity","PhysiologicalProcess","PathologicalProcess", "BiologicalEntity","Cohort","OrganismTaxon","Virus", "Device","GrossAnatomicalStructure","AnatomicalEntity","Cell", "PopulationOfIndividualOrganisms","MolecularActivity","ComplexMolecularMixture"] +103ms ```

colleenXu commented 6 months ago

Basic implementation ideas

EDIT: after discussion with Andrew 1/8.

1. I think the creative-mode query would be like this (click to expand): QNodes aren't set with any biolink-category, the QEdge predicate is set to "related_to"

I think having no QEdge predicate would also make sense, but our current creative-mode won't run when I don't specify a predicate: I get 0 results and the warning log `bte:biothings-explorer-trapi:inferred-mode Inferred Mode edge must specify a predicate. Your query terminates. +0ms`. This is for imatinib ➡️ CML (Chronic myelogenous leukemia) ``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["PUBCHEM.COMPOUND:5291"] }, "n1": { "ids":["MONDO:0011996"] } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": ["biolink:related_to"], "knowledge_type": "inferred" } } } } } ```

  1. BTE could look up the starting IDs with NodeNorm and retrieve the QNode categories early, before picking the matching templateGroups. Does BTE currently do this (I think it does)?
  2. BTE should then match the query to the matching Pathfinder templateGroups. All templateGroups will at least have the Explain-query w/ 0 intermediates (see if they or their descendants are directly connected).
click to see generic templates for 0 and 1 intermediates

Notes: * The templates don't set categories for the `creativeQuerySubject` / `creativeQueryObject` because I think it'd be best if BTE plugged in the categories from the NodeNorm ID-lookup. Right now, BTE doesn't seem to be doing this and raises an error (also noted in the next "Issues" section). * I set a predicate list for QEdges to exclude ones that didn't seem useful or may create unhelpful self-edges: superclass_of, subclass_of, broad_match / narrow_match, close_match / exact_match / same_as * I'm assuming that the intermediates are what matters, so it's better for the 1-intermediate template to be startingID1 ➡️ intermediate ⬅️ startingID2 (rather than a one-direction path from startingID1 ➡️ intermediate ➡️ startingID2) #### First template: 0 intermediates ``` { "message": { "query_graph": { "nodes": { "creativeQuerySubject": { }, "creativeQueryObject": { } }, "edges": { "eA": { "subject": "creativeQuerySubject", "object": "creativeQueryObject", "predicates": [ "biolink:related_to_at_instance_level", "biolink:disease_has_location", "biolink:location_of_disease", "biolink:composed_primarily_of", "biolink:primarily_composed_of", "biolink:has_chemical_role", "biolink:has_member", "biolink:member_of" ] } } } } } ``` #### Second template: 1 intermediate ``` { "message": { "query_graph": { "nodes": { "creativeQuerySubject": { }, "nA": { "categories":["biolink:NamedThing"] }, "creativeQueryObject": { } }, "edges": { "eA": { "subject": "creativeQuerySubject", "object": "nA", "predicates": [ "biolink:related_to_at_instance_level", "biolink:disease_has_location", "biolink:location_of_disease", "biolink:composed_primarily_of", "biolink:primarily_composed_of", "biolink:has_chemical_role", "biolink:has_member", "biolink:member_of" ] }, "eB": { "subject": "creativeQueryObject", "object": "nA", "predicates": [ "biolink:related_to_at_instance_level", "biolink:disease_has_location", "biolink:location_of_disease", "biolink:composed_primarily_of", "biolink:primarily_composed_of", "biolink:has_chemical_role", "biolink:has_member", "biolink:member_of" ] } } } } } ```

  1. For the first template (0 intermediates), BTE will have 1 result if there are edges between the two starting IDs. But BTE could return a lot of results when there's 1 intermediate (1 per unique intermediate).

Issues for implementation

A. Template-group matching -> Andrew agrees with this

I think Pathfinder queries should only use Pathfinder templates (when both QNodes have 1 ID and `knowledge_type: inferred`), and vice-versa * The Pathfinder templates wouldn't work for MVPs 1-2 because they're too general * The MVP1-2 templates won't work for Pathfinder because they use `is_set: true` for intermediate nodes, so each result represents a unique "answer" (the QNode at the open end of the Predict-type query). But for Pathfinder/Explain-type where we set both starting QNodes to specific IDs, using `is_set: true` for the intermediate nodes will collapse all the "answers" into 1 giant result - which I assume we don't want.

B. Odd bug(?) noticed before winter break

I'm not sure if this will be a problem if we adjust the template-group matching and test again with a Pathfinder-template-group + queries. But before winter break, I noticed that: * BTE's creative-mode currently doesn't allow > 1 QNode ID total, across all QNodes * Jackson set up a query-handler branch [inferred_explain](https://suwulab.slack.com/archives/D029DAYUKS6/p1703098649420589) that should've removed that check, but [something odd seemed to happen when none of the retrieved edges actually connected the two starting IDs](https://suwulab.slack.com/archives/CC218TEKC/p1703167200721069).

C. Problems setting up Pathfinder template-groups

With query-handler's `inferred_explain` branch checked out, I tried setting up a Pathfinder-template-group with the two generic templates I included above. But I encountered issues (also see my "recreating the problems" section below this list): * I thought I could set the templateGroup's subject / object to `NamedThing` so it'd work no matter what the starting-ID's category was. But when I tried this, BTE wouldn't use the template-group. If I set the subject / object to every biolink category, then BTE would use the template-group. * But then, BTE has an error `bte:biothings-explorer-trapi:error_handler TypeError: queryGraph.nodes.creativeQuerySubject.categories is not iterable`. I think it's because the generic Pathfinder-templates don't have QNode categories. I intended for BTE to plug in the categories from the NodeNorm ID-lookup...but this doesn't seem to happen here. * could be mitigated by having different templateGroups for different starting QNode categories and then setting those categories in the templates...but it's kinda redundant having multiple first templates (the 0-intermediate one) * So the Chem -> DiseaseOrPheno has direct edge, 1 intermediate, and specific ones. Put the starting IDs into the Chem + DoP QNodes so they start w/ those categories, then add the ones from NodeNorm * I'm not sure if either of these are intended behavior...for example, is there some problem we avoid by not expanding the subject / object categories?

recreating the problems

First, check out query-handler's `inferred_explain` branch and replace the contents of query-handler's templateGroups.json file with this (`pnpm build` after!): ``` [ { "name": "Pathfinder: find paths between two entities", "subject": ["NamedThing"], "predicate": ["related_to"], "object": ["NamedThing"], "templates": [ "pathfinder-direct.json", "pathfinder-1intermediate.json" ] } ] ```

Second, query BTE with this Pathfinder-style query

``` { "message": { "query_graph": { "nodes": { "n0": { "ids":["PUBCHEM.COMPOUND:5291"], "name": "imatinib" }, "n1": { "ids":["MONDO:0011996"], "name": "cml" } }, "edges": { "e0": { "subject": "n0", "object": "n1", "predicates": ["biolink:related_to"], "knowledge_type": "inferred" } } } } } ```

For me, BTE returned no results and the console log said `bte:biothings-explorer-trapi:inferred-mode No Templates matched your inferred-mode query. Your query terminates. +0ms` Third, I changed the templateGroup.json contents to include every biolink category (`pnpm build` after!). ``` [ { "name": "Pathfinder: find paths between two entities", "subject": ["Attribute","ChemicalRole","BiologicalSex","PhenotypicSex","GenotypicSex","SeverityValue","OrganismAttribute","PhenotypicQuality","Zygosity","ClinicalAttribute","ClinicalMeasurement","ClinicalModifier","ClinicalCourse","Onset","SocioeconomicAttribute","GenomicBackgroundExposure","PathologicalProcessExposure","PathologicalAnatomicalExposure","DiseaseOrPhenotypicFeatureExposure","ChemicalExposure","DrugExposure","DrugToGeneInteractionExposure","ComplexChemicalExposure","BioticExposure","EnvironmentalExposure","GeographicExposure","BehavioralExposure","SocioeconomicExposure","OrganismTaxon","Event","AdministrativeEntity","Agent","InformationContentEntity","StudyResult","ConceptCountAnalysisResult","ObservedExpectedFrequencyAnalysisResult","RelativeFrequencyAnalysisResult","TextMiningResult","ChiSquaredAnalysisResult","LogOddsAnalysisResult","StudyVariable","CommonDataElement","Dataset","DatasetDistribution","DatasetVersion","DatasetSummary","ConfidenceLevel","EvidenceType","Publication","Book","BookChapter","Serial","Article","JournalArticle","Patent","WebPage","PreprintPublication","DrugLabel","RetrievalSource","PhysicalEntity","MaterialSample","Activity","Study","Procedure","Phenomenon","Device","DiagnosticAid","PlanetaryEntity","EnvironmentalProcess","EnvironmentalFeature","GeographicLocation","GeographicLocationAtTime","BiologicalEntity","RegulatoryRegion","AccessibleDnaRegion","TranscriptionFactorBindingSite","BiologicalProcessOrActivity","MolecularActivity","BiologicalProcess","Pathway","PhysiologicalProcess","Behavior","PathologicalProcess","GeneticInheritance","OrganismalEntity","Bacterium","Virus","CellularOrganism","Mammal","Human","Plant","Invertebrate","Vertebrate","Fungus","LifeStage","IndividualOrganism","Case","PopulationOfIndividualOrganisms","StudyPopulation","Cohort","AnatomicalEntity","CellularComponent","Cell","GrossAnatomicalStructure","PathologicalAnatomicalStructure","CellLine","DiseaseOrPhenotypicFeature","Disease","PhenotypicFeature","BehavioralFeature","ClinicalFinding","Gene","MacromolecularComplex","NucleosomeModification","Genome","Polypeptide","Protein","ProteinIsoform","ProteinDomain","PosttranslationalModification","ProteinFamily","NucleicAcidSequenceMotif","GeneFamily","Genotype","Haplotype","SequenceVariant","Snv","ReagentTargetedGene","ChemicalEntity","MolecularEntity","SmallMolecule","NucleicAcidEntity","Exon","Transcript","RnaProduct","RnaProductIsoform","NoncodingRnaProduct","MicroRna","SiRna","CodingSequence","ChemicalMixture","MolecularMixture","Drug","ComplexMolecularMixture","ProcessedMaterial","Food","EnvironmentalFoodContaminant","FoodAdditive","ClinicalEntity","ClinicalTrial","ClinicalIntervention","Hospitalization","Treatment","NamedThing"], "predicate": ["related_to"], "object": ["Attribute","ChemicalRole","BiologicalSex","PhenotypicSex","GenotypicSex","SeverityValue","OrganismAttribute","PhenotypicQuality","Zygosity","ClinicalAttribute","ClinicalMeasurement","ClinicalModifier","ClinicalCourse","Onset","SocioeconomicAttribute","GenomicBackgroundExposure","PathologicalProcessExposure","PathologicalAnatomicalExposure","DiseaseOrPhenotypicFeatureExposure","ChemicalExposure","DrugExposure","DrugToGeneInteractionExposure","ComplexChemicalExposure","BioticExposure","EnvironmentalExposure","GeographicExposure","BehavioralExposure","SocioeconomicExposure","OrganismTaxon","Event","AdministrativeEntity","Agent","InformationContentEntity","StudyResult","ConceptCountAnalysisResult","ObservedExpectedFrequencyAnalysisResult","RelativeFrequencyAnalysisResult","TextMiningResult","ChiSquaredAnalysisResult","LogOddsAnalysisResult","StudyVariable","CommonDataElement","Dataset","DatasetDistribution","DatasetVersion","DatasetSummary","ConfidenceLevel","EvidenceType","Publication","Book","BookChapter","Serial","Article","JournalArticle","Patent","WebPage","PreprintPublication","DrugLabel","RetrievalSource","PhysicalEntity","MaterialSample","Activity","Study","Procedure","Phenomenon","Device","DiagnosticAid","PlanetaryEntity","EnvironmentalProcess","EnvironmentalFeature","GeographicLocation","GeographicLocationAtTime","BiologicalEntity","RegulatoryRegion","AccessibleDnaRegion","TranscriptionFactorBindingSite","BiologicalProcessOrActivity","MolecularActivity","BiologicalProcess","Pathway","PhysiologicalProcess","Behavior","PathologicalProcess","GeneticInheritance","OrganismalEntity","Bacterium","Virus","CellularOrganism","Mammal","Human","Plant","Invertebrate","Vertebrate","Fungus","LifeStage","IndividualOrganism","Case","PopulationOfIndividualOrganisms","StudyPopulation","Cohort","AnatomicalEntity","CellularComponent","Cell","GrossAnatomicalStructure","PathologicalAnatomicalStructure","CellLine","DiseaseOrPhenotypicFeature","Disease","PhenotypicFeature","BehavioralFeature","ClinicalFinding","Gene","MacromolecularComplex","NucleosomeModification","Genome","Polypeptide","Protein","ProteinIsoform","ProteinDomain","PosttranslationalModification","ProteinFamily","NucleicAcidSequenceMotif","GeneFamily","Genotype","Haplotype","SequenceVariant","Snv","ReagentTargetedGene","ChemicalEntity","MolecularEntity","SmallMolecule","NucleicAcidEntity","Exon","Transcript","RnaProduct","RnaProductIsoform","NoncodingRnaProduct","MicroRna","SiRna","CodingSequence","ChemicalMixture","MolecularMixture","Drug","ComplexMolecularMixture","ProcessedMaterial","Food","EnvironmentalFoodContaminant","FoodAdditive","ClinicalEntity","ClinicalTrial","ClinicalIntervention","Hospitalization","Treatment","NamedThing"], "templates": [ "pathfinder-direct.json", "pathfinder-1intermediate.json" ] } ] ``` Then I tried the same query again. I got status 500 and these console logs: ``` bte:biothings-explorer-trapi:inferred-mode Query proceeding in Inferred Mode. +0ms bte:biothings-explorer-trapi:inferred-mode Looking up query Templates +0ms bte:biothings-explorer-trapi:inferred-mode Got 2 inferred query templates. +10ms bte:biothings-explorer-trapi:error_handler TypeError: queryGraph.nodes.creativeQuerySubject.categories is not iterable bte:biothings-explorer-trapi:error_handler at /Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/inferred_mode/inferred_mode.js:168:70 bte:biothings-explorer-trapi:error_handler at Array.map () bte:biothings-explorer-trapi:error_handler at InferredQueryHandler.createQueries (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/inferred_mode/inferred_mode.js:166:38) bte:biothings-explorer-trapi:error_handler at async InferredQueryHandler.query (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/inferred_mode/inferred_mode.js:402:28) bte:biothings-explorer-trapi:error_handler at async TRAPIQueryHandler._handleInferredEdges (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:505:39) bte:biothings-explorer-trapi:error_handler at async TRAPIQueryHandler.query (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:566:13) bte:biothings-explorer-trapi:error_handler at async task (/Users/colleenxu/Desktop/biothings_explorer/packages/bte-server/built/routes/v1/query_v1.js:34:13) bte:biothings-explorer-trapi:error_handler at async runTask (/Users/colleenxu/Desktop/biothings_explorer/packages/bte-server/built/controllers/threading/threadHandler.js:265:26) bte:biothings-explorer-trapi:error_handler at async /Users/colleenxu/Desktop/biothings_explorer/packages/bte-server/built/routes/v1/query_v1.js:18:34 +0ms ```

D. Other implementation issues

colleenXu commented 5 months ago

Data-source note from Andrew: perhaps a Cell marker database (gene <-> cell type and gene <-> tissue) like http://xteam.xbio.top/CellMarker/search.jsp?quickSearchInfo=c-kit would be helpful to add...

Genomewide commented 5 months ago

@colleenXu Do you have the data from these queries so we can play with it?

colleenXu commented 5 months ago

@Genomewide

We don't have a full TRAPI response (running all the templates and merging the results into 1 set). The paths in the result sub-graphs may also be too long for the current UI to handle (> 3 edges, 4 nodes?).

You could try working with some of BTE's responses for the individual template-runs (you can ignore the extra notes, that's for our team):

colleenXu commented 5 months ago

Queries ran locally for prototype presentation: https://docs.google.com/presentation/d/1gFFGJGumtHU_ktHKM2FKauTpC-0bvAh-H_ZI49-qDsI/edit?usp=sharing

all queries set imatinib as ChemicalEntity, disease as DiseaseOrPhenotypicFeature. I'm assuming BTE's templates would set the template placeholder nodes to these categories - which is how our templates/implementation currently work. We could adjust this to have no "template categories" in the future maybe?

(Ran on local instance, main branches + fix-776 branches for workspace/api-response-transform for #776. Also w/o threading or caching.)

1 intermediate

imatinib (ChemicalEntity) → NamedThing ← asthma (DiseaseOrPheno)

[imatinib-inter-asthma-latest.json](https://github.com/biothings/biothings_explorer/files/13985140/imatinib-inter-asthma-latest.json) * 6 min 47 s, 949 results * top results are still KIT, PDGFRA ![Screen Shot 2024-01-18 at 10 27 55 PM](https://github.com/biothings/biothings_explorer/assets/43731687/a28e6181-0159-4f39-98e0-7a1fbdb170da)

imatinib (ChemicalEntity) → NamedThing ← CML (DiseaseOrPheno)

[imatinib-inter-cml-latest.json](https://github.com/biothings/biothings_explorer/files/13985238/imatinib-inter-cml-latest.json) * 5 min 37 s, 1546 results * top results are still BCR, ABL1 ![Screen Shot 2024-01-18 at 10 38 44 PM](https://github.com/biothings/biothings_explorer/assets/43731687/fab3f256-6e06-4d27-8cf6-bee6f2741814)

Gene → Cell

See previous post for imatinib → Gene → Cell ← asthma

imatinib (ChemicalEntity) → Gene → Cell ← CML (DiseaseOrPheno)

[imatinib-gene-cell-cml.json](https://github.com/biothings/biothings_explorer/files/13985481/imatinib-gene-cell-cml.json) * 1 min 45 s, 1419 results * 15 unique entities: interesting ones are Hematopoietic stem cells, Blast Cell, Bone Marrow Cells, granulocyte, Pluripotent Stem Cells * others: cultured cell line, t-lymphocyte, stem cells, Lymphocyte, Neoplastic Cell, Clone Cells, K-562, Leukemic Cell, lymphoblast, Blood Cells * results w/ gene BCR: * BCR → blast cell is result 8 (also connected to hematopoietic stem cells in result 215, bone marrow cells in result 642) * fusion proteins, bcr-abl → hematopoietic stem cells is result 206 * results w/ gene ABL1: * ABL1 → hematopoietic stem cells in result 180 (bone marrow cells in result 606)

Gene → PhysiologicalProcess,Pathway

imatinib (ChemicalEntity) → Gene → PhysiologicalProcess,Pathway ← Asthma (DiseaseOrPheno)

[imatinib-gene-physiopath-asthma.json](https://github.com/biothings/biothings_explorer/files/13986041/imatinib-gene-physiopath-asthma.json) * 6 min 35 s, 1254 results * Doing this because * BiologicalProcess takes too long to run (>13 min w/ 21 unique intermediates) - these are the most promising children terms * most interesting is "IgE responsiveness, atopic" * KIT connected to edema (HP:0000969) and cardiac rhythm disease (MONDO:0007263), anaphylaxis (MONDO:0100053), respiratory arrest (HP:0005943) * also, MolecularActivity wasn't interesting (see previous post) * no exact matches for "immune cell activation", but some stuff is close * no pathways found * 38 physiologicalprocess terms, most were generic. Some interesting ones were: * immune response: results 166-195 * bronchoconstriction: result 234 * histamine release * t-cell activation * neutrophil infiltration * immune cell processes * host defense * antiviral response * cytokine production: results 236 - 245 * Negative Regulation of Inflammatory Response Process

imatinib (ChemicalEntity) → Gene → PhysiologicalProcess, Pathway ← Asthma (DiseaseOrPheno)

[imatinib-gene-physiopath-cml.json](https://github.com/biothings/biothings_explorer/files/13986274/imatinib-gene-physiopath-cml.json) * 3 min 35 s, 3398 results * Doing this because BiologicalActivity would probably take too long to run * no exact matches for "cell cycle", but some stuff is close * 29 Pathways found! some interesting ones: * Cyclin D associated events in G1 (Homo sapiens) - reactome * pathways in cancer - bioplanet * Inhibition of cellular proliferation by Gleevec - bioplanet * Chronic myeloid leukemia - bioplanet * 12 physiologicalprocess terms. Some interesting ones: * cell proliferation: results 8-282. BCR in 102, ABL1 in 279. * lymphocyte activation (results 1-7) * mitotic metaphase * negative regulation of g2 phase

Genomewide commented 5 months ago

I did not respond to one of your previous comments, but 3 edges is fine. That is the max though. I look forward to seeing this!

colleenXu commented 2 months ago

Closing, pathfinder efforts are now in https://github.com/biothings/biothings_explorer/issues/794