Open karafecho opened 1 year ago
Thanks for this identifiers! I've added these identifiers to the brand new Automat-CAM-KP test suite (https://github.com/ExposuresProvider/cam-pipeline/pull/111), and here are the results I have:
CURIE | Normalized to | How many unique CURIEs is this connected to in Automat-CAM-KP? |
---|---|---|
PUBCHEM.COMPOUND:5865 | Normalized | 30 |
CHEMBL.COMPOUND:CHEMBL1256818 | PUBCHEM.COMPOUND:5462351 | None |
PUBCHEM.COMPOUND:165363555 | Normalized | None |
HMDB:HMDB0252416 | PUBCHEM.COMPOUND:2462 | None |
PUBCHEM.COMPOUND:123600 | Normalized | None |
HMDB:HMDB0242500 | PUBCHEM.COMPOUND:2462 | None |
CHEBI:5147 | PUBCHEM.COMPOUND:3410 | None |
CHEMBL.COMPOUND:CHEMBL158 | PUBCHEM.COMPOUND:5742832 | 9 |
PUBCHEM.COMPOUND:145068 | Normalized | 258 |
PUBCHEM.COMPOUND:281 | Normalized | 64 |
@balhoff Do you have thoughts on figuring out how to plug in the gaps we see here in node coverage? I'm guessing we need new data sources.
Thanks, @gaurav! While we don't have a 1:1 match between CURIEs, the matches that we do have are representative, with two drugs and two chemical exposures, and will allow us to move this effort along.
This Swagger example query runs successfully, but it returns 0 results. If I replace the input CURIES with PUBCHEM.COMPOUND:5865 from the table above, the query also runs successfully, but it returns 0 results. I think the Automat example queries are standardized and not tailored to the underlying KGs, so perhaps you can send me an example query that returns results from CAM KP? Thanks!
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"categories": [
"biolink:ChemicalEntity"
],
"ids": [
"CHEMBL.COMPOUND:CHEMBL3234626",
"CHEMBL.COMPOUND:CHEMBL3234633"
]
},
"n1": {
"categories": [
"biolink:GeneOrGeneProduct"
],
"ids": [
"NCBIGene:2099"
]
}
},
"edges": {
"e01": {
"subject": "n0",
"object": "n1",
"predicates": [
"biolink:affects"
],
"qualifier_constraints": [
{
"qualifier_set": [
{
"qualifier_type_id": "biolink:object_aspect_qualifier",
"qualifier_value": "activity"
},
{
"qualifier_type_id": "biolink:object_direction_qualifier",
"qualifier_value": "increased"
},
{
"qualifier_type_id": "biolink:qualified_predicate",
"qualifier_value": "biolink:causes"
}
]
}
]
}
}
}
},
"workflow": [
{
"id": "lookup"
}
]
}
Hi Kara! Sorry about the confusion: that Swagger example query can't currently be configured for individual platers, so we share a single Swagger with all the platers on Automat. That one isn't relevant to us, and has two main problems:
NCBIGene:2099
.So the following query will work:
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"categories": [
"biolink:ChemicalEntity"
]
},
"n1": {
"categories": [
"biolink:GeneOrGeneProduct"
],
"ids": [
"NCBIGene:2099"
]
}
},
"edges": {
"e01": {
"subject": "n0",
"object": "n1",
"predicates": [
"biolink:affects"
]
}
}
}
},
"workflow": [
{
"id": "lookup"
}
]
}
No confusion, I was aware that the Swagger examples aren't really "examples" for most of the Automats, including cam-kp and icees-kg. Thanks for an actual example query!
This query returns results when sent directly to automat-icees-kg at https://automat.renci.org/#/.
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"categories": [
"biolink:DiseaseOrPhenotypicFeature"
],
"ids": [
"MONDO:0009061"
]
},
"n1": {
"categories": [
"biolink:ChemicalEntity"
]
}
},
"edges": {
"e01": {
"subject": "n0",
"object": "n1",
"predicates": [
"biolink:correlated_with"
]
}
}
}
},
"workflow": [
{
"id": "lookup"
}
]
}
And this query returns responses when sent directly to automat-cam-kp at https://automat.renci.org/#/.
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"categories": [
"biolink:ChemicalEntity"
],
"ids": [
"PUBCHEM.COMPOUND:5865"
]
},
"n1": {
"categories": [
"biolink:GeneOrGeneProduct"
]
}
},
"edges": {
"e01": {
"subject": "n0",
"object": "n1",
"predicates": [
"biolink:affects"
]
}
}
}
},
"workflow": [
{
"id": "lookup"
}
]
}
But this query, while able to run successfully, returns an empty response when sent to WFR at https://translator-workflow-runner.renci.org/docs#/trapi/run_workflow_query_post.
{
"workflow": [
{
"id": "lookup"
},
{
"id":"score"
}
],
"message": {
"query_graph": {
"edges": {
"e0": {
"predicates": [
"biolink:correlated_with"
],
"subject": "n0",
"object": "n1",
"provided_by": {
"allowlist": [
"infores:automat-icees-kg"
]
}
},
"e1": {
"subject": "n1",
"object": "n2",
"predicates": [
"biolink:affects"
],
"provided_by": {
"allowlist": [
"infores:automat-cam-kp"
]
}
}
},
"nodes": {
"n0": {
"ids": [
"MONDO:0009061"
],
"is_set": false
},
"n1": {
"categories": [
"biolink:ChemicalEntity"
],
"is_set": false
},
"n2": {
"categories": [
"biolink:GeneOrGeneProduct"
],
"is_set": false
}
}
}
}
}
This comes from going through ARAs that have strict kp timeouts vs sending queries directly to kps. I also wasn't able to get any results from the WFR, but sending directly to Aragorn with an extended timeout returns a 16.6MB response. 12k results in total. ICEES-KG took 35 seconds to respond to the first hop (normal timeout is 10s) and returned 106 results, and then CAM-KP took 90 seconds to respond with the 12k results. If you want, I can share entire response.
Thanks, Max.
Given your findings, then the revised query below should run when sent to WFR and return results. However, while it runs successfully, it returns an empty KG.
{
"workflow": [
{
"id": "lookup",
"runner_parameters": {
"allowlist": ["infores:aragorn"]
}
},
{
"id":"score"
}
],
"message": {
"query_graph": {
"edges": {
"e0": {
"predicates": [
"biolink:correlated_with"
],
"subject": "n0",
"object": "n1",
"provided_by": {
"allowlist": [
"infores:automat-icees-kg"
]
}
},
"e1": {
"subject": "n1",
"object": "n2",
"predicates": [
"biolink:affects"
],
"provided_by": {
"allowlist": [
"infores:automat-cam-kp"
]
}
}
},
"nodes": {
"n0": {
"ids": [
"MONDO:0009061"
],
"is_set": false
},
"n1": {
"categories": [
"biolink:ChemicalEntity"
],
"is_set": false
},
"n2": {
"categories": [
"biolink:GeneOrGeneProduct"
],
"is_set": false
}
}
}
}
}
Your query doesn't have the extended timeout that I'm able to set directly in Aragorn. So WFR is returning nothing because icees-kg is timed out on the first hop. This is a performance issue, and I'm only able to get results back because I can peek behind the curtain and turn some hidden knobs.
Oh, I see. That makes sense.
In that case, perhaps you can send me the full response?
Just so everyone is clear, the goal of this effort is three-fold:
Also see [this GitHub folder](https://github.com/NCATSTranslator/Clinical-Data-Committee-Tracking-Voting/tree/main/GetCreative()_DrugDiscoveryRepurposing_RarePulmonaryDisease/MVP2_Path_A) and slide 9 in this slide deck.
Per decision on 01.03.2024: Max will rerun the above queries with extended timeouts in ARAGORN and cache the results. Kara will then test.
From Meisha, 01/17/2024:
Title: Peptide Oxidation Leading to Hypertension
Description from the wiki:
Here we present the supporting information on an AOP describing how vascular endothelial peptide oxidation leads to hypertension via perturbation of endothelial nitric oxide (NO) bioavailability. The molecular initiating event is oxidation of amino acid (AA) residues on critical peptides of the NO pathway, notably protein kinase B (AKT), guanosine triphosphate cyclohydrolase-1 (GTPCH-1), endothelial nitric oxide synthase (eNOS), and also the cellular ROS scavenger; glutathione. Oxidation of the enzymic components of the pathway lead to reduced expression of the phosphorylated proteins, and protein loss via proteasomal degradation. Oxidation of reduced glutathione to GSSG promotes bonding of GSSG to critical AA residues on eNOS, and the reduced expression of GTPCH-1 reduces bioavailability of tetrahydrobiopterin (BH4), both of which lead to uncoupling of eNOS (reduced NO production, increased superoxide production). The combination of these molecular events lead to reduced bioavailabilty of NO, which in turn reduces the potential for vasodilation and shifts the balance of vascular tone towards vasoconstriction. Repeated perturbation of this pathway via chronic exposure to toxicants, ultimately increases vascular resistance and contributes towards the development of hypertension.
From Max, 02/05/2024: cam_kp_integration_response.json - CF - ChemicalEntity - GeneOrGeneProduct
ChemicalEntity = propranolol
https://pubmed.ncbi.nlm.nih.gov/23539159/
https://www.uspharmacist.com/article/advances-in-the-management-of-cystic-fibrosis
https://www.journal-of-hepatology.eu/article/S0168-8278(15)00349-9/fulltext
I took another stab at the CURIEs I couldn't figure out previously, and found three more of them in CAM-KP. Most of these are NodeNorm issues in one way or another, but at least one of them could be fixed by turning on drug conflation when processing CAM-KP. I propose we use the alternate CURIEs I listed below while I try to figure out the NodeNorm issues.
CURIE | Normalized to | Should actually be normalized to | How many unique CURIEs is this connected to in Automat-CAM-KP? |
---|---|---|---|
CHEMBL.COMPOUND:CHEMBL1256818 | PUBCHEM.COMPOUND:5462351 ("Dextromethorphan hydrobromide monohydrate") | PUBCHEM.COMPOUND:5360696 ("Dextromethorphan") | None, but should exist (see Dextromethorphan on CTD) |
PUBCHEM.COMPOUND:165363555 ("Trifacta") | Normalized | N/A | None |
HMDB:HMDB0252416 ("Fluticasone") | PUBCHEM.COMPOUND:4659387 ("Fluticasona [Spanish]") | PUBCHEM.COMPOUND:5311101 ("Fluticasone") | 88 |
PUBCHEM.COMPOUND:123600 ("Levalbuterol") | Normalized | N/A | None |
HMDB:HMDB0242500 ("Budesonide") | PUBCHEM.COMPOUND:5281004 | N/A | 167 |
CHEBI:5147 ("Formoterol") | PUBCHEM.COMPOUND:3410 ("Formoterol") | PUBCHEM.COMPOUND:45358055 ("Foradil Certihaler"), but cliques to 3410 with drug_conflation turned on | 53 |
This issue is to report that CAM KP does not respond to any of the ICEES KG-derived CURIES in this sheet and also appended below. Is this expected behavior? Is this a normalization issue? Is this something else?