NCATSTranslator / testing

Materials and tools for testing Translator components
1 stars 9 forks source link

Question of the Month #4: SRI-37330 #206

Closed karafecho closed 11 months ago

karafecho commented 2 years ago

Submitting team

UI Team (Andy C.)

SME

Anath Shalev, MD: Professor of Medicine – Endocrinology, Diabetes, and Metabolism, Department of Medicine; Director, Comprehensive Diabetes Center; Senior Scientist, Comprehensive Diabetes Center; Senior Scientist, Cystic Fibrosis Research Center; Senior Scientist, Center for Clinical and Translational Science, General Clinical Research Center; and Senior Scientist, Center for Exercise Medicine, Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham

Challenge Question

What is the molecular target of SRI-37330?

Background

A cell assay screen for potential drug treatments for diabetes yielded a number of drugs with unknown targets. One candidate, SRI-37330, appeared to be beneficial in a murine model of diabetes and had an added benefit of improving fatty liver. Dr. Shalev has been unable to identify the exact target of the candidate (even after two years of in silico structural biology modeling) and cannot file a patent without one. See https://pubmed.ncbi.nlm.nih.gov/32726606/ for additional information.

Important comment: In tackling this QotM, Translator team members should be mindful of intellectual property, given that Dr. Shalev is hoping to patent Drug X.

Timeline for QotM Challenge #4

July 8, 2022 - First Friday of the Month Standup

July 15, 2022 - Second Friday of the Month Standup

July 21, 2022 - Third Thursday of the Month Mini-hackathon

July 29, 2022 - Fourth Friday of the Month

colleenXu commented 2 years ago

[EDITING over time; I will only include info in published works (so info that is only in slides marked confidential will not be included here as starting points)]

I found these curies for starting points:

specific compounds:

more general chemical terms:

Genes from Dr. Shalev's paper, Figure 1i:

Diseases:


After 7/8 call:

Looking for transcriptional regulators of TXNIP (see literature on it here):

something to do with dna methylation? articles here

colleenXu commented 2 years ago

A. I tried running SRI-37330, its free-base version, and the initial hit compound (PUBCHEM.COMPOUND ids) and I didn't find connections to anything in BTE (or other tools). https://arax.ncats.io/?r=44c1df7a-6249-49c5-b0a1-308912c5e7c2

first query ``` { "message": { "query_graph": { "edges": { "e00": { "subject": "n0", "object": "n1" } }, "nodes": { "n0": { "ids": ["PUBCHEM.COMPOUND:153319520", "PUBCHEM.COMPOUND:42960360", "PUBCHEM.COMPOUND:142582737"], "categories": ["biolink:SmallMolecule"], "name": "SRI-37330 and initial hit" }, "n1": { "categories": ["biolink:NamedThing"] } } } } } ```

B. I tried running the Quinazolines curie to find related things. https://arax.ncats.io/?r=71925704-0279-41ee-9416-55fb611d2241

I did a quick set of CTRL-F on the response from BTE....and one "glucokinase" result seemed interesting.

It's from semmeddb and links a review that seems interesting; it's about the glucokinase activation approach of looking for anti-diabetic compounds. The review has multi-page table of compounds, which includes 1 Quinazoline derivative. This quinazoline deriative's paper contains structures that look (to my untrained eye) to be somewhat similar to SRI-37330...

second query ``` { "message": { "query_graph": { "edges": { "e00": { "subject": "n0", "object": "n1" } }, "nodes": { "n0": { "ids": ["UMLS:C0034407"], "categories": ["biolink:SmallMolecule"], "name": "Quinazolines" }, "n1": { "categories": ["biolink:Gene", "biolink:DiseaseOrPhenotypicFeature", "biolink:BiologicalProcessOrActivity"] } } } } } ```

C. I tried finding genes in common with the Quinazolines curie and diabetes. https://arax.ncats.io/?r=cc4018fd-414c-406e-b61b-1eb5be09dd11

I did a quick look at BTE's response, and some results may be interesting (although they may be "expected"): TCF7L2, EGF and EGFR, CDK2.

third query ``` { "message": { "query_graph": { "edges": { "e00": { "subject": "n0", "object": "n1" }, "e01": { "subject": "n2", "object": "n1" } }, "nodes": { "n0": { "ids": ["UMLS:C0034407"], "categories": ["biolink:SmallMolecule"], "name": "Quinazolines" }, "n1": { "categories": ["biolink:Gene"] }, "n2": { "ids": ["HP:0011015", "MONDO:0005015", "MONDO:0005147", "MONDO:0005148"], "categories": ["biolink:DiseaseOrPhenotypicFeature"], "is_set": true, "name": "diabetes terms" } } } } } ```

I tried replacing Quinazolines with Sulfonamides in the above query structure but I didn't find the results interesting. Also I tried the query structure (chemical -> DiseaseOrPhenotypicFeature or BiologicalProcessOrActivity <- genes TXNIP, NLRP1, MLXIPL, CTGF, MAFA, IGF1R, BCL2L1) below but I didn't find the results interesting.

query I didn't run through ARS ``` { "message": { "query_graph": { "edges": { "e00": { "subject": "n0", "object": "n1" }, "e01": { "subject": "n2", "object": "n1" } }, "nodes": { "n0": { "ids": ["UMLS:C0038760"], "categories": ["biolink:SmallMolecule"], "name": "Sulfonamides" }, "n1": { "categories": ["biolink:DiseaseOrPhenotypicFeature", "biolink:BiologicalProcessOrActivity"] }, "n2": { "ids": ["NCBIGene:10628", "NCBIGene:22861", "NCBIGene:51085", "NCBIGene:1490", "NCBIGene:389692", "NCBIGene:3480", "NCBIGene:598"], "categories": ["biolink:Gene"], "is_set": true, "name": "TXNIP, NLRP1, MLXIPL, CTGF, MAFA, IGF1R, BCL2L1" } } } } } ```
GregHydeDartmouth commented 2 years ago

Hi all. I'm pasting the queries I was showing from the session this morning. Both of these queries were motivated by CHPs new edge that exposes tissue-gene specificity analysis using the GTEX data. The specificity ranking allows us to rank order gene's specificity to a tissue according to their expression profiles in each tissue. We think this approach might be a meaningful way to assist on the "what drugs may treat disease x" due to a significant number of disease causing genes being highly specific in expression to the tissue the disease manifests in. This lacks a complete picture of disease pathology, but you could consider this like exploring the disease neighborhood even if you don't have a map of the streets. Regardless the queries are organized as:

( I actually encode this first hop explicitly for debugging purposes for myself. I extracted tissues related to Type I and Type II Diabetes ahead of time and seeded the relevant UBERON curies explicitly)

Disease (Type I and Type II Diabetes) --related_to--> Tissues? --expresses--> Gene? --related_to--> Gene(TXNIP) https://arax.ncats.io/?r=51900 Result 1 and 2 seemed to interest the SME as they themselves have looked at NKX6-1.

Disease (Type I and Type II Diabetes) --related_to--> Tissues? --expresses--> Gene? --related_to--> Disease (Fatty Liver Disease) https://arax.ncats.io/?r=51896 I used fatty liver disease here as the prompt indicated the drug assisted with fatty liver. Its possible that this approach is ill-suited.

Two disclaimers on these results. I'm ONLY using CHP and ARAX to return on these edges for the following reasons:

  1. for the tissue-gene relationship, the only other team that returns on these tissues is ARAX, who provides a broad range of genes that are expressed in the tissues. However, I view CHP's returns as a more constrained set of genes in that there is a rank ordering of specificity and those returned will only be the most specific.
  2. Using workflows to specify allow list (e.g., allowlist": ["infores:connections-hypothesis"] for e1) seems to have unexpected behavior in that I have to write an allow list for each hop, and It crashes if I use a list of more than 1 source. Given that I had to specify an allow list for CHP to answer the tissue-gene relationship, I also had to specify an allowlist for all other hops. Its possible I misunderstand how to correctly specify workflows. I will expand this workflow to the whole consortium when I am able and update those results here.
dkoslicki commented 2 years ago

@GregHydeDartmouth the bug about crashing when there is more than one source has been fixed and deployed to https://arax.transltr.io/ (ITRB endpoint), so feel free to try again if you'd like. You shouldn't need to specify an allow_list for subsequent fill operations if they were used in previous ones. Operations are intended to stateless

MarkDWilliams commented 2 years ago

I'd love to hear from our SME about which of the possible identifiers is the best match for SRI-37330

colleenXu commented 2 years ago

The queries I showed today:

quinazoline -> Gene <- TXNIP-related gene set ``` { "message": { "query_graph": { "edges": { "e00": { "subject": "n0", "object": "n1" }, "e01": { "subject": "n2", "object": "n1" } }, "nodes": { "n0": { "ids": ["UMLS:C0034407"], "categories": ["biolink:SmallMolecule"], "name": "Quinazolines" }, "n1": { "categories": ["biolink:Gene"] }, "n2": { "ids": ["NCBIGene:10628", "NCBIGene:22861", "NCBIGene:51085", "NCBIGene:1490", "NCBIGene:389692", "NCBIGene:3480", "NCBIGene:598", "NCBIGene:2308", "NCBIGene:22877", "NCBIGene:2033"], "categories": ["biolink:Gene"], "is_set": true, "name": "TXNIP, NLRP1, MLXIPL, CTGF, MAFA, IGF1R, BCL2L1, FOXO1, MLXIP, EP300" } } } } } ```
karafecho commented 2 years ago

Thanks, @colleenXu, for posting the queries and walking Dr. Shalev through the results, which I suspect she found very useful, in terms of understanding what Translator can and cannot do.

colleenXu commented 2 years ago

Another query that may have interesting results, but it's a lot to go through....trying to find genes related to both diabetes and the TXNIP-related genes (aka looking for a potential target without looking for a connection to a quinazoline):

Tools returned a lot of results for it. BTE got 1880. https://arax.ncats.io/?r=0b8c4789-6b0b-468e-99de-30d7688dcf3a

Diabetes terms -> Genes <- the full gene set (TXNIP, NLRP1, MLXIPL, CTGF, MAFA, IGF1R, BCL2L1, FOXO1, MLXIP, EP300) ``` { "message": { "query_graph": { "edges": { "e00": { "subject": "n0", "object": "n1" }, "e01": { "subject": "n2", "object": "n1" } }, "nodes": { "n0": { "ids": ["HP:0011015", "MONDO:0005015", "MONDO:0005147", "MONDO:0005148"], "categories": ["biolink:DiseaseOrPhenotypicFeature"], "is_set": true, "name": "diabetes terms" }, "n1": { "categories": ["biolink:Gene"] }, "n2": { "ids": ["NCBIGene:10628", "NCBIGene:22861", "NCBIGene:51085", "NCBIGene:1490", "NCBIGene:389692", "NCBIGene:3480", "NCBIGene:598", "NCBIGene:2308", "NCBIGene:22877", "NCBIGene:2033"], "categories": ["biolink:Gene"], "is_set": true, "name": "TXNIP, NLRP1, MLXIPL, CTGF, MAFA, IGF1R, BCL2L1, FOXO1, MLXIP, EP300" } } } } } ```
GregHydeDartmouth commented 2 years ago

Hi all, Here is my update from yesterday’s QOTM session. Because the prompt indicated tissue related differences in the regulation of TXNIP, I thought this would be a good way to use our Tissue-Gene specificity ranking. However, showing how genes are differentially expressed in tissues using this tool required a little bit of background analysis. That is, we currently expose a specificity of tissues to a gene (and vice versa), but do not currently expose a tissue-gene differential based on specificity. Therefore I used the following pseudocode:

differences = []
For gene in genes:
    diff = specificity_Pancreas(gene) – specificity_liver(gene)
    differences.append(diff)
sort(differences)

I used this to extract the top 1000 genes that were specific to the pancreas. This is only half of the battle though because the QOTM prompt indicated differences in the alpha cells vs the beta cells of the pancreas. Regardless I constructed two queries around this:

Query 1

(1000 genes) – related to -> (GCK, TXNIP and other TXNIP related genes) https://arax.ncats.io/?r=1f9e7c78-67bd-4db0-afa4-354eaccd30ca

I’m going to provide my more detailed findings to Andy to pass along to Dr. Shalev. But this query shows genes that are differentially expressed in the pancreas, that have some connection to genes of interest for the prompt. This includes glucokinase, TXNIP, and TXNIP related genes. This query now includes the top 1000 most specific genes to the pancreas, rather than the 500 I presented in the QOTM hacking session.

Query 2

(Quinazoline) - affects -> (1000 genes) - affects -> (GCK, TXNIP and other TXNIP related genes) https://arax.ncats.io/?r=b99f9de9-c873-43f1-8f23-47e13acd6b2d

This query is shows the relationships between Quanazoline, the 1000 most specific genes to the pancreas, and the genes of interest from the prompt. This, again, includes glucokinase, TXNIP, and TXNIP related genes. This query includes 1000 genes rather than the 500 I presented in the QOTM hacking session. Unfortunately I still only see EGF as a mediator between node 1 and node 3 with 1000 genes as it was with the 500 genes.

I'm still interpreting more of the results from Query 1. I will update here or offline to Andy if I find anything else.

colleenXu commented 2 years ago

@GregHydeDartmouth Perhaps it would be helpful to change your query, to put is_set:true on the QNode with the TXNIP/GCK genes. That way, each result will show how one of those 1000 genes relates to the entire set of the TXNIP/GCK genes...

karafecho commented 11 months ago

Closing with comment in #233 ...