Closed edeutsch closed 4 months ago
I see lots of SemMedDB edges. But I don't see any ChEMBL edges. Shouldn't we expect to see ChEMBL edges?
I see PTGS1 and PTGS2 in there, and that’s the extent of my bio knowledge about what the answer should be (without looking into it further).
Re: ChEMBL edges, @saramsey would know, but I do see provided_by: identifiers_org_registry:chembl.compound
, so maybe ChEMBL is being ingested from identifiers.org?
ah, yes, I see it now, thanks. identifiers_org_registry:chembl.compound
is the CURIE that is being used to mean "provided by ChEMBL". Looks odd, but makes sense. There's no single namespace handle for ChEMBL as a whole I guess.
Alright, I'll bite. What's the question? When I click on https://arax.ncats.io/beta/?m=3234 I get this screen:
You have requested ARAX message id = 3234 Retrieving ARAX message id = 3234 Normal completion Rendering message...done.
So although the answer to the question might be sensible, I am not sure the question itself is sensible.
The question is: what proteins is/does acetaminophen connected to/associated with/show more association in literature with?
Sent from my mobile device, please excuse my brevity and/or typos
From: Jared Roach notifications@github.com Sent: Saturday, November 21, 2020 2:19:51 AM To: RTXteam/RTX RTX@noreply.github.com Cc: David Koslicki dmkoslicki@gmail.com; Assign assign@noreply.github.com Subject: Re: [RTXteam/RTX] Is the answer to the default question sensible? (#1131)
Alright, I'll bite. What's the question? When I click on https://arax.ncats.io/beta/?m=3234 I get this screen:
You have requested ARAX message id = 3234 Retrieving ARAX message id = 3234 Normal completion Rendering message...done.
So although the answer to the question might be sensible, I am not sure the question itself is sensible.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/RTXteam/RTX/issues/1131#issuecomment-731521759, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQROOHWVK3TBSSYVSMCEZDSQ5SZPANCNFSM4T5MBYCQ.
Poking a bit further, it looks like the question might be, "What proteins does acetominophen interact with?"
I think the answer list is reasonable. I have to wonder whether #3 refers to the sense of "Ache" that is synonymous with dull pain. Because if AChE means acetylcholinesterase, it is harder to track down the relationship. I wonder if the main knowledge driving this link is that if one overdoes on acetominophen, then a treatment for the resultant liver injury is acetylcholinesterase inhibitors.
Some of the interactions are with CYP genes that detoxify acetominphen so these are very reasonable.
the #1 hit to ALT is almost certainly due to the literature on acetominphen toxicity. "While acetaminophen overdose has been recognized as a cause of alanine aminotransferase (ALT) elevations for over 40 years..." So this is a pretty indirect mechanism. Tylenol kills the liver cells, which then leak a whole bunch o proteins, including ALT. So maybe it makes a ton of sense to return this as a #1 hit from Translator, but if one is expecting something along the lines of an interaction related to drug development thought processes, this seems distracting, not #1 important.
The above logic is also the source of the AST link #3. Aspartate aminotransferase
I don't think #7 is a protein. How did it end up as a protein node? gamma-glutamylcysteinylacetominophen maybe it is a dipeptide. Perhaps that counts as a protein. So my bad; let's call this a really good hit. But it comes back to the gray areas that were discussed in the node classification call we had a few weeks ago with the Data Representation committee.
And if we can call a dipeptide a protein, we might as well call individual amino acids proteins: L-cysteine zwitterion I am not sure how this ends up on the list. Maybe as a treatment for acetominphen overdose? yup. Google search confirms.
TL/DR: It sure would be nice to divide the results into two categories:
@jaredroach Re: the first part of you assessment: the “interacts with” flavor to this question is probably an artifact of the ranking. The question doesn’t actually specify and edge type (indicated below). I assume if you don’t filter to the top 50, on the bottom of the results, you might find other kinds of relationships.
{
"edges": [
{
"id": "qg2",
"source_id": "qg1",
"target_id": "qg0"
},
{
"id": "N1",
"relation": "N1",
"source_id": "qg0",
"target_id": "qg1",
"type": "has_normalized_google_distance_with"
}
],
"nodes": [
{
"curie": "CHEMBL.COMPOUND:CHEMBL112",
"id": "qg0"
},
{
"id": "qg1",
"type": "protein"
}
]
}
So I think the code is working brilliantly, as intended by us programmers. Not to say the result is wrong, but just to give the user's perspective. It is sort of like asking for literature relationships to "frog". If you are a biologist, you are anticipating results like "amphibian". So you get a little thrown when the best hits in English literature are to "Louis XIV".
I think we could improve in two ways initially. 1) Fix the node synonymizer to not conflate ache and AChE. I can do that pretty easily. 2) Rank known knowledge base interactions higher than SemMedDB associations so that they appear at the top of the list. I think we want known KB associations to outrank SemMedDB ones. I am uncertain on how to make that happen best, but I think we should try to figure out a way.
I was (I think) kidding a bit when I suggested AChE might be conflated with ache. I don't think that actually happened. It doesn't hurt to check the code though.
Changing rankings based on knowledge source makes sense. And has the advantage of being tunable/configurable. If someone was studying acetominophen overdosing, they could in theory elevate those results.
I understood you were kidding, but you were right! The node synonymizer did erroneously merge these. There are probably other cases.
closing ancient history.
Our default question for the JSON example has not changed. But the QueryGraphInterpreter now sends it to KG2 by default instead of KG1 And I forcibly limit it to the top 50 to avoid bloat
The question is: is the set of responses and rankings that are returned for this default question sensible? Or are there glaring errors?
https://arax.ncats.io/beta/?m=3234