Open andrewsu opened 2 years ago
Update: I used CASE 5 to test a bunch of queries, and came up with some results. I ran the queries in ARAX using "Post to Other" and specifying https://api.bte.ncats.io/v1 as the external API. The same queries in https://biothings.io/explorer/advanced did not always work give the same results and I have some questions in general about that interface (will ask seperately).
CASE 5 results:
Query:
{
"message": {
"query_graph": {
"nodes": {
"n1": {
"ids":
["HP:0002076", "HP:0001250"], "is_set": true,
"categories": [
"biolink:PhenotypicFeature"
]
},
"n2": {
"categories": [
"biolink:SmallMolecule"
]
},
"n3": {
"ids": [
"NCBIGene:6513"
],
"categories": [
"biolink:Gene"
]
}
},
"edges": {
"e1": {
"subject": "n2",
"object": "n1",
"predicates": [
"biolink:treats"
]
},
"e2": {
"subject": "n2",
"object": "n3"
}
}
}
}
}
https://docs.google.com/document/d/1f6R3x4DiYrCQkIhjDBavoHlLnswSY7gwT-9XqKWxUgo/edit?usp=sharing
For CASE 3: The individual likely has Argininosuccinic aciduria (MONDO:0008815), caused by the c.706C>T mutation. The disease causes lethargy, hyperammonemia and developmental delays. Known treatments are arginine and citric acid.
Results summary:
https://docs.google.com/document/d/1RMvySsf_eQsmWCNyxnB7hL0V497aBzBZxlvrAhlI8N8/edit?usp=sharing
@khanspers the Google doc for case 3 is great. Seeing the sequence of queries is very useful. You might also think about putting in snippets of the results just to show the provenance for the assertions. Or you can execute the queries through the ARS and then save the arax link. (Let me know if that doesn't make sense and I can write more details.) Nice work!
Thanks @andrewsu! I added a similar doc for CASE 5, although its not as clean. I will add some results/provenance to both docs.
CASE 2: Likely Niemann-Pick disease, type C1, which the variant is known to cause. Known treatment is Acetylcysteine.
Results:
https://docs.google.com/document/d/1exbo3wPc0jJZiePXdughpDyq2MVCHWulYsQKAUx-_3c/edit?usp=sharing
looks cool! I'm happy to see another person using BTE and finding useful info for questions.
@khanspers I've found this useful for putting queries and lots of info in github: https://gist.github.com/pierrejoubert73/902cc94d79424356a8d20be2b382e1ab
Not sure what to do with Google Docs. Generally I've been saving the JSON responses or the ARS/ARAX links. I can show you how to do this, if you need it.
Thanks @colleenXu. Originally the Google docs were started for my own organizational purposes, to be able to track and summarize better than in Postman, then I decided to share them. I realize they are not ideal. I figured out how to get the ARS links now, but they are only in one of the docs so far.
For the GitHub markdown example you linked to, the idea would be to replace each google doc with one of those?
@khanspers The google docs are fine! I notice that Github comments can do some things that I don't know how to do in Google docs, like attaching txt files (the JSON responses) and collapsible sections.
I noticed a query here and thought it would be easier to read if it was in a code block, in a collapsible section like this: https://github.com/biothings/BioThings_Explorer_TRAPI/issues/446#issuecomment-1125606184
CASE 1:
https://docs.google.com/document/d/1_CMYJfWBq6V4xQc41VR0Kfuv2Xem4FwqW-lFWRKIBfk/edit?usp=sharing
@khanspers I think for all these cases (and particularly case 1), you should treat the phenotype list as a partial list. Use the presence of a phenotype to help prioritize any leads, but I wouldn't use the absence of a phenotype as any sort of signal. With that additional information, can you have an additional look at Case 1?
CASE 4:
Results from BTE:
Existing knowledge on diseases/treatments (from Wikipedia):
=> Known therapy was found in BTE
https://docs.google.com/document/d/1w9whZGJHn5t1Vbvkdyy5Yr-e0Q9YU_SpVx2RxM_uurg/edit#
CASE 1 update:
Results from BTE:
Existing knowledge on diseases/treatments (from Wikipedia):
=> Known therapy was found in BTE
https://docs.google.com/document/d/1_CMYJfWBq6V4xQc41VR0Kfuv2Xem4FwqW-lFWRKIBfk/edit?usp=sharing
CASE 3 update:
Results:
Known treatments (from Wikipedia and rarediseases.org):
=> Some known therapies were found in BTE
https://docs.google.com/document/d/1RMvySsf_eQsmWCNyxnB7hL0V497aBzBZxlvrAhlI8N8/edit?usp=sharing
CASE 2 update:
Results from BTE:
Known treatments (from Wikipedia):
=> Known treatments were NOT found in BTE. BTE finds only one treatment, with evidence from a 2013 study in mouse.
https://docs.google.com/document/d/1exbo3wPc0jJZiePXdughpDyq2MVCHWulYsQKAUx-_3c/edit?usp=sharing
Some of my notes (not saying any of this is something to do):
Querying:
SmallMolecule
can be a bit specific -> ChemicalEntity
is the parent term that includes more semantic types like Drug
/MolecularMixture
(like "sodium benzoate" is classified as a mixture of sodium and benzoate).
Polypeptide
). Procedure
, but it's hard because our ID-resolution service (Translator's Node Normalizer) doesn't retrieve labels for the IDs. I have to go to outside resources (UMLS browser with an account, MESH, etc.) to look up the IDsDiseaseOrPhenotypicFeature
as a semantic type because resources might classify a term using this broader/parent term when it's unclear where to put it. This is also equivalent to asking for both Disease
and PhenotypicFeature
. NamedEntity
aka "anything" was useful to use here. It looks like there was sometimes not much information...treats
predicate. It depends on what info a resource has and how we can retrieve it (for some resources it's difficult to get separate relationship types in separate queries).On searching results:
Missing info:
Thanks @colleenXu, those are great tips! Some of the things you mention I've realized while working on these cases (how to use treats
, how to find non-drug treatments etc). The strategy used for these cases is definitely not optimal and could definitely be optimized.
For the non-drug treatments, I was able to find those by not specifying the node type, and using the treats
predicate. The results then include lots of UMLS IDs (without labels as you mention), but for most of them I could see what they were in the ARAX UI by clicking on them.
As a separate use case, Alex asked me to check if known treatments for AML were found in BTE, based on a publication. Results are in this spreadsheet: Several treatments were found, but some were not.
Perhaps it may be useful to discuss / review...
Maybe discuss this after figuring out the next point? But I think the answer is yes...
A question though...Was the scoring of results helpful at all? or problematic (helpful diseases and treatments pushed to the bottom)?
I think so? See my next comment, perhaps Kristina can try this approach on the other cases?
Also, my stab at Case 1 (the first steps are probably the same as Kristina's, I just wrote them out in great detail here :P)
CASE 1
VARIANTS
TH NM_199292.2. c.541C>T; p.Gln181Ter. chr11:2189760
TH NM_199292.2 c.785C>G; p.Thr262Ser. chr11:2188668
HPO: delayed speech and language development
How to find variant IDs from this?
Can I go anywhere from the variants (dbSNP IDs are the best / most-reliably annotated for operations right now) Nope. Running SequenceVariant "DBSNP:rs1590169710", "DBSNP:rs1590168246" -> NamedThing. All I get is that both are variants of the Gene TH.
How to find phenotype IDs from this? Well....it's not hard, search HPO. Found HP:0000750 "delayed speech and language development".
So it looks like the starting points are going to be Genes and PhenotypicFeatures. The SequenceVariants / genomic features are only sometimes going to work as starting points...
Response as of 2022-07-19:
Use creative mode? Note that only 1 disease at a time can be run in creative mode
As of 2022-07-19, It'll find only levodopa as a result (from the first template, the second template gives 1966 results which is over the max and not included in the result set).
When running creative-mode for BH4-deficient hyperphenylalaninemia A (MONDO:0009863), the two results were pramipexol and Dopamine Agonists from semmeddb. The pramipexol came from an article that seems interesting
CASE 2 - 6011
NPC1
(NM_000271.3)
chr 18: 21119857G>A
c.2713C>T; pGLN905* - Homozygous
CLINICAL PRESENTATION: Cholestasis, hyperbilirubinemia, hepatosplenomegaly
(this was before HPO terms used)
Can I go anywhere from the variant? -> The only two results are the NPC1 gene and Niemann-Pick disease, type C1.
Since we know the disease from the variant, we could just use it directly in a query to look for treatments. But lets try Colleen's explain-style query to find the disease first, using the HPO terms:
{
"message": {
"query_graph": {
"edges": {
"e00": {
"subject": "n0",
"object": "n1"
},
"e01": {
"subject": "n2",
"object": "n1",
"predicates": ["biolink:phenotype_of"]
}
},
"nodes": {
"n0": {
"ids": ["NCBIGene:4864"],
"categories": ["biolink:Gene"],
"is_set": true,
"name": "NPC1"
},
"n1": {
"categories": ["biolink:Disease"]
},
"n2": {
"ids": ["HP:0001396", "HP:0002904", "HP:0001433"],
"categories": ["biolink:PhenotypicFeature"],
"is_set": true
}
}
}
}
}
Results as of 2022-08-11:
Results as of 2022-08-29:
{
"message": {
"query_graph": {
"nodes": {
"disease": {
"ids": ["MONDO:0018982"]
},
"chemical": {
"categories": ["biolink:ChemicalEntity"]
}
},
"edges": {
"t_edge": {
"object": "disease",
"subject": "chemical",
"predicates": ["biolink:treats"],
"knowledge_type": "inferred"
}
}
}
}
}
Results as of 2022-08-11:
Results as of 2022-08-29:
CASE 3 - 6094
ASL
c.706C>T, p.Arg236Trp (homozygous
CLINICAL PRESENTATION: 7 day old female presented with lethargy and altered mental status found to have hyperammonemia. Concern for underlying diagnosis urea cycle disorder – likely citrullinemia. Parents are first cousins.
(this was before HPO terms used)
Can I go anywhere from the variant? -> The only two results are the ASL gene and argininosuccinic aciduria (MONDO:0008815).
We have a potential disease from the variant query (and from the initial information from Radys), but let's try the explain-style query using the two HP terms.
{
"message": {
"query_graph": {
"edges": {
"e00": {
"subject": "n0",
"object": "n1"
},
"e01": {
"subject": "n2",
"object": "n1",
"predicates": ["biolink:phenotype_of"]
}
},
"nodes": {
"n0": {
"ids": ["NCBIGene:435"],
"categories": ["biolink:Gene"],
"is_set": true,
"name": "ASL"
},
"n1": {
"categories": ["biolink:Disease"]
},
"n2": {
"ids": ["HP:0001987", "HP:0001254"],
"categories": ["biolink:PhenotypicFeature"],
"is_set": true
}
}
}
}
}
Results as of 2022-08-11:
Results as of 2022-08-29:
{
"message": {
"query_graph": {
"nodes": {
"disease": {
"ids": ["MONDO:0008815"]
},
"chemical": {
"categories": ["biolink:ChemicalEntity"]
}
},
"edges": {
"t_edge": {
"object": "disease",
"subject": "chemical",
"predicates": ["biolink:treats"],
"knowledge_type": "inferred"
}
}
}
}
}
Results as of 2022-08-11:
Results as of 2022-08-29:
CASE 4 - 243
ALDH7A1
c.328C>T, p.Arg110Ter
HPO TERMS: Jaundice;Poor appetite;Ventriculomegaly;Seizure
Can I go anywhere from the variant?
Since we have a pretty good hunch on the disease, lets try going straight for the treatment query.
{
"message": {
"query_graph": {
"nodes": {
"disease": {
"ids": ["MONDO:0020741"]
},
"chemical": {
"categories": ["biolink:ChemicalEntity"]
}
},
"edges": {
"t_edge": {
"object": "disease",
"subject": "chemical",
"predicates": ["biolink:treats"],
"knowledge_type": "inferred"
}
}
}
}
}
Results as of 2022-08-12:
Results as of 2022-08-29:
{
"message": {
"query_graph": {
"nodes": {
"disease": {
"ids": ["MONDO:0009945"]
},
"chemical": {
"categories": ["biolink:ChemicalEntity"]
}
},
"edges": {
"t_edge": {
"object": "disease",
"subject": "chemical",
"predicates": ["biolink:treats"],
"knowledge_type": "inferred"
}
}
}
}
}
Results as of 2022-08-12:
Results as of 2022-08-29:
CASE 5 - 3081
SLC2A1
c.1202C>T p.Pro401Leu
HPO TERMS: Migraine, Seizure
Can I go anywhere from the variant? -> Nope, 0 results. So proceed with HP terms and the gene.
{
"message": {
"query_graph": {
"edges": {
"e00": {
"subject": "n0",
"object": "n1"
},
"e01": {
"subject": "n2",
"object": "n1",
"predicates": ["biolink:phenotype_of"]
}
},
"nodes": {
"n0": {
"ids": ["NCBIGene:6513"],
"categories": ["biolink:Gene"],
"is_set": true,
"name": "ASL"
},
"n1": {
"categories": ["biolink:Disease"]
},
"n2": {
"ids": ["HP:0002076", "HP:0001250"],
"categories": ["biolink:PhenotypicFeature"],
"is_set": true
}
}
}
}
}
Results as of 2022-08-12:
Results as of 2022-08-29:
The first result is encephalopathy due to GLUT1 deficiency. Note that it seems encephalopathy is a severe outcome of GLUT1 deficiency syndrome, and given the limited and vague symptoms described in the patient, perhaps this is not the right focus. Let's try the other result first, childhood onset GLUT1 deficiency syndrome 2.
{
"message": {
"query_graph": {
"nodes": {
"disease": {
"ids": ["MONDO:0012805"]
},
"chemical": {
"categories": ["biolink:ChemicalEntity"]
}
},
"edges": {
"t_edge": {
"object": "disease",
"subject": "chemical",
"predicates": ["biolink:treats"],
"knowledge_type": "inferred"
}
}
}
}
}
Results as of 2022-08-12:
Results as of 2022-08-29:
BTE has undergone some changes (I think these have been deployed or will be deployed soon), and that affects the creative-mode queries. Would we want to consider re-running? @andrewsu ?
I think rerunning with the latest version of creative mode would be nice-but-not-necessary...
Update for CASE 5
Curie for ketogenic diet is UMLS:C0259972, biolink:Procedure.
Query 3 using biolink:Procedure instead of biolink:Chemical
{
"message": {
"query_graph": {
"nodes": {
"disease": {
"ids": ["MONDO:0012805"]
},
"treatment": {
"categories": ["biolink:Procedure"]
}
},
"edges": {
"t_edge": {
"object": "disease",
"subject": "treatment",
"predicates": ["biolink:treats"],
"knowledge_type": "inferred"
}
}
}
}
}
Results as of 2022-09-01:
Can we go anywhere from the "ketogenic diet" curie?
{
"message": {
"query_graph": {
"nodes": {
"n1": {
"ids": [
"UMLS:C0259972"
]
},
"n2": {
"categories": [
"biolink:NamedEntity"
]
}
},
"edges": {
"e1": {
"subject": "n1",
"object": "n2"
}
}
}
}
}
Results as of 2022-09-01:
EDIT in progress
Note about the above queries:
NamedEntity
with NamedThing
(error). And set the category for the UMLS ketogenic-diet ID to Procedure and Treatment, since Node Normalizer likely doesn't recognize this ID (so we have to set the category). No results after it did all its sub-queries.
The response has encephalopathy due to GLUT1 deficiency (MONDO:0011724)
(result 13), and 209 results total. You can paste the contents of the text file (which is JSON) into ARAX's import-response section to view it in their UI.
Note about the above queries:
NamedEntity
with NamedThing
(error). And set the category for the UMLS ketogenic-diet ID to Procedure and Treatment, since Node Normalizer likely doesn't recognize this ID (so we have to set the category). No results after it did all its sub-queries.
Replacing the ID with the encephalopathy due to GLUT1 deficiency
(MONDO:0011724, result 13) gets two results. One is the ketogenic diet treatment, and one is a diagnostic test. You can paste the contents of the text file (which is JSON) into ARAX's import-response section to view it in their UI.
glut1deficiency.txt
The response has encephalopathy due to GLUT1 deficiency
(MONDO:0011724, result 13), and 209 results total.
The doc that was presented had some ARAX links from Case 1 from more recent runs:
From our colleagues at RCIGM, we have several example use cases related to genetic diseases. The goal is to take info about the variants and the phenotypes, and to propose candidate therapies. For these five cases, there are known therapies that we should be able to get. The goal is to assess the ability of skilled analyst to identify those therapies. Good set of starter cases for @khanspers to work on... (cc @AlexanderPico)
CASE 1 VARIANTS TH NM_199292.2. c.541C>T; p.Gln181Ter. chr11:2189760 TH NM_199292.2 c.785C>G; p.Thr262Ser. chr11:2188668 HPO: delayed speech and language development
CASE 2 - 6011 NPC1 (NM_000271.3) chr 18: 21119857G>A c.2713C>T; pGLN905* - Homozygous
CLINICAL PRESENTATION: Cholestasis, hyperbilirubinemia, hepatosplenomegaly (this was before HPO terms used)
CASE 3 - 6094 ASL c.706C>T, p.Arg236Trp (homozygous CLINICAL PRESENTATION: 7 day old female presented with lethargy and altered mental status found to have hyperammonemia. Concern for underlying diagnosis urea cycle disorder – likely citrullinemia. Parents are first cousins. (this was before HPO terms used)
CASE 4 - 243 ALDH7A1 c.328C>T, p.Arg110Ter HPO TERMS: Jaundice;Poor appetite;Ventriculomegaly;Seizure
CASE 5 - 3081 SLC2A1 c.1202C>T p.Pro401Leu HPO TERMS: Migraine, Seizure