Closed saramsey closed 1 year ago
Here is the TRAPI query graph. It is of type "inferred"
:
{
"edges": {
"t_edge": {
"attribute_constraints": [],
"knowledge_type": "inferred",
"object": "on",
"predicates": [
"biolink:affects"
],
"qualifier_constraints": [
{
"qualifier_set": [
{
"qualifier_type_id": "biolink:object_aspect_qualifier",
"qualifier_value": "activity_or_abundance"
},
{
"qualifier_type_id": "biolink:object_direction_qualifier",
"qualifier_value": "decreased"
}
]
}
],
"subject": "sn"
}
},
"nodes": {
"on": {
"categories": [
"biolink:Gene"
],
"constraints": [],
"is_set": false,
"fulltextname": "on"
},
"sn": {
"categories": [
"biolink:ChemicalEntity"
],
"constraints": [],
"ids": [
"PUBCHEM.COMPOUND:4091"
],
"is_set": false,
"fulltextname": "sn"
}
}
}
Looks like the ARAX query result includes two result nodes of category biolink:GenomicEntity
:
https://arax.ncats.io/beta/?r=173466
See result 267, "ROS1 wt Allele":
Here is the TRAPI object for that result:
"UMLS:C1709820": {
"attributes": [
{
"attribute_source": null,
"attribute_type_id": "biolink:xref",
"attributes": null,
"description": null,
"original_attribute_name": null,
"value": [
"UMLS:C1709820"
],
"value_type_id": null,
"value_url": null
},
{
"attribute_source": null,
"attribute_type_id": "biolink:synonym",
"attributes": null,
"description": null,
"original_attribute_name": null,
"value": [
"ROS1 wt Allele"
],
"value_type_id": null,
"value_url": null
},
{
"attribute_source": null,
"attribute_type_id": "biolink:IriType",
"attributes": null,
"description": null,
"original_attribute_name": null,
"value": "https://identifiers.org/umls:C1709820",
"value_type_id": "metatype:Uri",
"value_url": "https://identifiers.org/umls:C1709820"
},
{
"attribute_source": null,
"attribute_type_id": "biolink:description",
"attributes": null,
"description": null,
"original_attribute_name": null,
"value": "Human ROS1 wild-type allele is located within 6q22 and is approximately 137 kb in length. This allele, which encodes proto-oncogene tyrosine-protein kinase ROS protein, is involved in receptor tyrosine phosphorylation signal transduction.; UMLS Semantic Type: STY:T028",
"value_type_id": "metatype:String",
"value_url": null
},
{
"attribute_source": null,
"attribute_type_id": "biolink:category",
"attributes": null,
"description": "Categories of all nodes in this synonym set in RTX-KG2.",
"original_attribute_name": null,
"value": [
"biolink:GenomicEntity"
],
"value_type_id": "metatype:Uriorcurie",
"value_url": null
}
],
"categories": [
"biolink:GenomicEntity"
],
"name": "ROS1 wt Allele"
},
See also result 408, "PRKAA1 wt Allele":
Here is the TRAPI object for that result:
"UMLS:C3890397": {
"attributes": [
{
"attribute_source": null,
"attribute_type_id": "biolink:xref",
"attributes": null,
"description": null,
"original_attribute_name": null,
"value": [
"UMLS:C3890397"
],
"value_type_id": null,
"value_url": null
},
{
"attribute_source": null,
"attribute_type_id": "biolink:synonym",
"attributes": null,
"description": null,
"original_attribute_name": null,
"value": [
"PRKAA1 wt Allele"
],
"value_type_id": null,
"value_url": null
},
{
"attribute_source": null,
"attribute_type_id": "biolink:IriType",
"attributes": null,
"description": null,
"original_attribute_name": null,
"value": "https://identifiers.org/umls:C3890397",
"value_type_id": "metatype:Uri",
"value_url": "https://identifiers.org/umls:C3890397"
},
{
"attribute_source": null,
"attribute_type_id": "biolink:description",
"attributes": null,
"description": null,
"original_attribute_name": null,
"value": "Human PRKAA1 wild-type allele is located in the vicinity of 5p12 and is approximately 39 kb in length. This allele, which encodes 5'-AMP-activated protein kinase catalytic subunit alpha-1 protein, plays a role in the modulation of many cellular processes through the phosphorylation of metabolic enzymes and transcription regulatory proteins.; UMLS Semantic Type: STY:T028",
"value_type_id": "metatype:String",
"value_url": null
},
{
"attribute_source": null,
"attribute_type_id": "biolink:category",
"attributes": null,
"description": "Categories of all nodes in this synonym set in RTX-KG2.",
"original_attribute_name": null,
"value": [
"biolink:GenomicEntity"
],
"value_type_id": "metatype:Uriorcurie",
"value_url": null
}
],
"categories": [
"biolink:GenomicEntity"
],
"name": "PRKAA1 wt Allele"
},
Looking in the ARAX UI "Synonyms" tool on arax.ncats.io/beta
, I see that there is in fact a node with category biolink:GenomicEntity
in the Synonyms tool results:
and same goes for "PRKAA1 wt Allele":
Looks like this is coming from the SRI NodeNormalizer, so maybe we can bounce this issue to the SRI NodeNormalizer folks?
In KG2.8.6pre, this is already changed to biolink:Gene
, as seen here:
and here:
In KG2.8.4c/Neo4j, we have the incorrect (biolink:GenomicEntity
) category:
In KG2.8.4pre/Neo4j, we have biolink:BiologicalEntity
, which is an abstract class and also therefore not allowed as a category
, but that's a different problem from biolink:GenomicEntity
and definitely not the cause):
Similarly for the other node:
This makes me think that the appearance of biolink:GenomicEntity
in KG2.8.4c for these two nodes may be due to the node normalizer. Tagging @amykglen .
To summarize, one issue that is causing a validation error in the originally referenced ARAX TRAPI result is that there are two result nodes, UMLS:C1709820
("ROS1 wt Allele") and UMLS:C3890397
("PRKAA1 wt Allele") both of which are annotated in the TRAPI result as having a category
of biolink:GenomicEntity
(which is not allowed because GenomicEntity
is a "mixin" per the Biolink YAML model.
So, where is the "genomic entity" category coming from? Based on my investigation, I do not think it is coming from the current production RTX-KG2 graph (i.e., RTX-KG2.8.4c). But I notice that when I query the SRI Node Normalization service with this request:
https://nodenormalization-sri.renci.org/1.4/get_normalized_nodes?curie=UMLS%3AC1709820&conflate=true&drug_chemical_conflate=false&description=false
I get back "Genomic Entity" in the response:
{
"UMLS:C1709820": {
"id": {
"identifier": "UMLS:C1709820",
"label": "ROS1 wt Allele"
},
"equivalent_identifiers": [
{
"identifier": "UMLS:C1709820",
"label": "ROS1 wt Allele"
}
],
"type": [
"biolink:GenomicEntity"
]
}
}
The same goes for when I query with the other CURIE:
https://nodenormalization-sri.renci.org/1.4/get_normalized_nodes?curie=UMLS%3AC3890397&conflate=true&drug_chemical_conflate=false&description=false
I get back a response with "Genomic Entity":
{
"UMLS:C3890397": {
"id": {
"identifier": "UMLS:C3890397",
"label": "PRKAA1 wt Allele"
},
"equivalent_identifiers": [
{
"identifier": "UMLS:C3890397",
"label": "PRKAA1 wt Allele"
}
],
"type": [
"biolink:GenomicEntity"
]
}
}
So in NCATSTranslator/Feedback issue 586, I tagged and assigned @gaurav on the node normalization issue. Since that's the only validation error (leaving aside the warnings) from this query, I've deassigned myself from the NCATSTranslator/Feedback issue (i.e., the parent issue).
Looks like this is coming from the SRI NodeNormalizer, so maybe we can bounce this issue to the SRI NodeNormalizer folks?
Ah, now I see how you did that. Apparently if I had just studied the output of the ARAX-UI Synonyms tool more closely, I could have avoided a bunch of querying Neo4j endpoints. Thank you!
This has been traced to an SRI Node Normalizer issue. I've updated the original NCATSTranslator/Feedback issue with the results of our investigation. Seems like nothing left to do here. Yes, biolink:BiologicalEntity
is an abstract class but when we roll out KG2.8.6c, this will get fixed).
See NCATSTranslator/Feedback issue 586.