Open sstemann opened 1 year ago
Based on our team's investigation, the issue that is causing the validation error in the ARAX TRAPI result (the error, not the warnings) is that there are two result nodes, UMLS:C1709820
("ROS1 wt Allele") and UMLS:C3890397
("PRKAA1 wt Allele") both of which are annotated in the TRAPI result as having a category
of biolink:GenomicEntity
(which is not allowed because GenomicEntity
is a "mixin" per the Biolink YAML model.
So, where is the "genomic entity" category coming from? Based on our investigation, we do not think it is coming from the current production RTX-KG2 graph (i.e., RTX-KG2.8.4c). But I notice that when I query the SRI Node Normalization service with this request:
https://nodenormalization-sri.renci.org/1.4/get_normalized_nodes?curie=UMLS%3AC1709820&conflate=true&drug_chemical_conflate=false&description=false
I get back "Genomic Entity" in the response:
{
"UMLS:C1709820": {
"id": {
"identifier": "UMLS:C1709820",
"label": "ROS1 wt Allele"
},
"equivalent_identifiers": [
{
"identifier": "UMLS:C1709820",
"label": "ROS1 wt Allele"
}
],
"type": [
"biolink:GenomicEntity"
]
}
}
The same goes for when I query with the other CURIE:
https://nodenormalization-sri.renci.org/1.4/get_normalized_nodes?curie=UMLS%3AC3890397&conflate=true&drug_chemical_conflate=false&description=false
I get back a response with "Genomic Entity":
{
"UMLS:C3890397": {
"id": {
"identifier": "UMLS:C3890397",
"label": "PRKAA1 wt Allele"
},
"equivalent_identifiers": [
{
"identifier": "UMLS:C3890397",
"label": "PRKAA1 wt Allele"
}
],
"type": [
"biolink:GenomicEntity"
]
}
}
Tagging and assigning @gaurav and the SRI crew on the node normalization issue.
Since all signs point to SRI Node Normalizer service at this point (at least, as concerns the validation error), I'm deassigning myself for now. Also marking as bug. If something else comes up that needs our team's input, please LMK.
Thanks so much for the detailed analysis, @saramsey! Yup, this is definitely a NodeNorm issue: the problem is that UMLS semantic type T028 "Gene or Genome" is mapped to the mixin biolink:GenomicEntity in the Biolink model. As long as a clique has a non-mixin semantic type as well, all will be well, but if a clique (like the ones in this ticket) only has T028 as a semantic type, we will give it a type of biolink:GenomicEntity and it will not have any parent types. I've opened a ticket in the Babel repository to figure out how widespread this problem is and whether simplify modifying the Biolink mapping will fix it, or if a more sophisticated fix is necessary (https://github.com/TranslatorSRI/Babel/issues/196).
Thanks @gaurav. Maybe consider opening an issue in the github:biolink/biolink-model github project area. My $0.02: we need a concrete class that is a reasonable approximation of T028. The simple solution here is to just promote GenomicEntity
to be a concrete class, and not a mixin. Seems like a pragmatic solution to me.
That's a good idea! I've opened that issue at https://github.com/biolink/biolink-model/issues/1405
@sierra-moxon @gaurav do you guys think this will change for lobster1?
what about Octopus2? @sierra-moxon @gaurav
This is a fairly difficult change to the model as it represents a break in the ChemicalEntity/BiologicalEntity hierarchy. Moving to hammerhead.
I don't think this made it into Hammerhead, so I think we should push this into the next phase.
Used the Test UI to run "What genes may be downregulated by:Metformin"
PK: 4cb4ba22-c60e-4652-ac9f-b2e8b7a2306a UI: https://ui.test.transltr.io/main/results?l=Metformin&i=PUBCHEM.COMPOUND:4091&t=4&q=4cb4ba22-c60e-4652-ac9f-b2e8b7a2306a
ARAGORN:
ARAX:
BTE: