RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Validation errors reported in Metformin query #2167

Closed saramsey closed 1 year ago

saramsey commented 1 year ago

See NCATSTranslator/Feedback issue 586.

saramsey commented 1 year ago

Here is the TRAPI query graph. It is of type "inferred":

{
  "edges": {
    "t_edge": {
      "attribute_constraints": [],
      "knowledge_type": "inferred",
      "object": "on",
      "predicates": [
        "biolink:affects"
      ],
      "qualifier_constraints": [
        {
          "qualifier_set": [
            {
              "qualifier_type_id": "biolink:object_aspect_qualifier",
              "qualifier_value": "activity_or_abundance"
            },
            {
              "qualifier_type_id": "biolink:object_direction_qualifier",
              "qualifier_value": "decreased"
            }
          ]
        }
      ],
      "subject": "sn"
    }
  },
  "nodes": {
    "on": {
      "categories": [
        "biolink:Gene"
      ],
      "constraints": [],
      "is_set": false,
      "fulltextname": "on"
    },
    "sn": {
      "categories": [
        "biolink:ChemicalEntity"
      ],
      "constraints": [],
      "ids": [
        "PUBCHEM.COMPOUND:4091"
      ],
      "is_set": false,
      "fulltextname": "sn"
    }
  }
}

Looks like the ARAX query result includes two result nodes of category biolink:GenomicEntity: https://arax.ncats.io/beta/?r=173466

See result 267, "ROS1 wt Allele": Screenshot 2023-10-17 at 10 23 09 AM

Here is the TRAPI object for that result:

        "UMLS:C1709820": {
          "attributes": [
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:xref",
              "attributes": null,
              "description": null,
              "original_attribute_name": null,
              "value": [
                "UMLS:C1709820"
              ],
              "value_type_id": null,
              "value_url": null
            },
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:synonym",
              "attributes": null,
              "description": null,
              "original_attribute_name": null,
              "value": [
                "ROS1 wt Allele"
              ],
              "value_type_id": null,
              "value_url": null
            },
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:IriType",
              "attributes": null,
              "description": null,
              "original_attribute_name": null,
              "value": "https://identifiers.org/umls:C1709820",
              "value_type_id": "metatype:Uri",
              "value_url": "https://identifiers.org/umls:C1709820"
            },
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:description",
              "attributes": null,
              "description": null,
              "original_attribute_name": null,
              "value": "Human ROS1 wild-type allele is located within 6q22 and is approximately 137 kb in length. This allele, which encodes proto-oncogene tyrosine-protein kinase ROS protein, is involved in receptor tyrosine phosphorylation signal transduction.; UMLS Semantic Type: STY:T028",
              "value_type_id": "metatype:String",
              "value_url": null
            },
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:category",
              "attributes": null,
              "description": "Categories of all nodes in this synonym set in RTX-KG2.",
              "original_attribute_name": null,
              "value": [
                "biolink:GenomicEntity"
              ],
              "value_type_id": "metatype:Uriorcurie",
              "value_url": null
            }
          ],
          "categories": [
            "biolink:GenomicEntity"
          ],
          "name": "ROS1 wt Allele"
        },

See also result 408, "PRKAA1 wt Allele": Screenshot 2023-10-17 at 10 24 33 AM

Here is the TRAPI object for that result:

        "UMLS:C3890397": {
          "attributes": [
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:xref",
              "attributes": null,
              "description": null,
              "original_attribute_name": null,
              "value": [
                "UMLS:C3890397"
              ],
              "value_type_id": null,
              "value_url": null
            },
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:synonym",
              "attributes": null,
              "description": null,
              "original_attribute_name": null,
              "value": [
                "PRKAA1 wt Allele"
              ],
              "value_type_id": null,
              "value_url": null
            },
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:IriType",
              "attributes": null,
              "description": null,
              "original_attribute_name": null,
              "value": "https://identifiers.org/umls:C3890397",
              "value_type_id": "metatype:Uri",
              "value_url": "https://identifiers.org/umls:C3890397"
            },
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:description",
              "attributes": null,
              "description": null,
              "original_attribute_name": null,
              "value": "Human PRKAA1 wild-type allele is located in the vicinity of 5p12 and is approximately 39 kb in length. This allele, which encodes 5'-AMP-activated protein kinase catalytic subunit alpha-1 protein, plays a role in the modulation of many cellular processes through the phosphorylation of metabolic enzymes and transcription regulatory proteins.; UMLS Semantic Type: STY:T028",
              "value_type_id": "metatype:String",
              "value_url": null
            },
            {
              "attribute_source": null,
              "attribute_type_id": "biolink:category",
              "attributes": null,
              "description": "Categories of all nodes in this synonym set in RTX-KG2.",
              "original_attribute_name": null,
              "value": [
                "biolink:GenomicEntity"
              ],
              "value_type_id": "metatype:Uriorcurie",
              "value_url": null
            }
          ],
          "categories": [
            "biolink:GenomicEntity"
          ],
          "name": "PRKAA1 wt Allele"
        },
saramsey commented 1 year ago

Looking in the ARAX UI "Synonyms" tool on arax.ncats.io/beta, I see that there is in fact a node with category biolink:GenomicEntity in the Synonyms tool results: Screenshot 2023-10-17 at 10 27 11 AM

and same goes for "PRKAA1 wt Allele": Screenshot 2023-10-17 at 10 28 27 AM

edeutsch commented 1 year ago

Looks like this is coming from the SRI NodeNormalizer, so maybe we can bounce this issue to the SRI NodeNormalizer folks?

saramsey commented 1 year ago

In KG2.8.6pre, this is already changed to biolink:Gene, as seen here: Screenshot 2023-10-17 at 10 31 01 AM

and here: Screenshot 2023-10-17 at 10 30 06 AM

saramsey commented 1 year ago

In KG2.8.4c/Neo4j, we have the incorrect (biolink:GenomicEntity) category: Screenshot 2023-10-17 at 10 36 05 AM

Screenshot 2023-10-17 at 10 37 02 AM

saramsey commented 1 year ago

In KG2.8.4pre/Neo4j, we have biolink:BiologicalEntity, which is an abstract class and also therefore not allowed as a category, but that's a different problem from biolink:GenomicEntity and definitely not the cause):

Screenshot 2023-10-17 at 10 42 48 AM

Similarly for the other node: Screenshot 2023-10-17 at 10 43 52 AM

This makes me think that the appearance of biolink:GenomicEntity in KG2.8.4c for these two nodes may be due to the node normalizer. Tagging @amykglen .

saramsey commented 1 year ago

To summarize, one issue that is causing a validation error in the originally referenced ARAX TRAPI result is that there are two result nodes, UMLS:C1709820 ("ROS1 wt Allele") and UMLS:C3890397 ("PRKAA1 wt Allele") both of which are annotated in the TRAPI result as having a category of biolink:GenomicEntity (which is not allowed because GenomicEntity is a "mixin" per the Biolink YAML model.

So, where is the "genomic entity" category coming from? Based on my investigation, I do not think it is coming from the current production RTX-KG2 graph (i.e., RTX-KG2.8.4c). But I notice that when I query the SRI Node Normalization service with this request:

https://nodenormalization-sri.renci.org/1.4/get_normalized_nodes?curie=UMLS%3AC1709820&conflate=true&drug_chemical_conflate=false&description=false

I get back "Genomic Entity" in the response:

{
  "UMLS:C1709820": {
    "id": {
      "identifier": "UMLS:C1709820",
      "label": "ROS1 wt Allele"
    },
    "equivalent_identifiers": [
      {
        "identifier": "UMLS:C1709820",
        "label": "ROS1 wt Allele"
      }
    ],
    "type": [
      "biolink:GenomicEntity"
    ]
  }
}

The same goes for when I query with the other CURIE:

https://nodenormalization-sri.renci.org/1.4/get_normalized_nodes?curie=UMLS%3AC3890397&conflate=true&drug_chemical_conflate=false&description=false

I get back a response with "Genomic Entity":

{
  "UMLS:C3890397": {
    "id": {
      "identifier": "UMLS:C3890397",
      "label": "PRKAA1 wt Allele"
    },
    "equivalent_identifiers": [
      {
        "identifier": "UMLS:C3890397",
        "label": "PRKAA1 wt Allele"
      }
    ],
    "type": [
      "biolink:GenomicEntity"
    ]
  }
}

So in NCATSTranslator/Feedback issue 586, I tagged and assigned @gaurav on the node normalization issue. Since that's the only validation error (leaving aside the warnings) from this query, I've deassigned myself from the NCATSTranslator/Feedback issue (i.e., the parent issue).

saramsey commented 1 year ago

Looks like this is coming from the SRI NodeNormalizer, so maybe we can bounce this issue to the SRI NodeNormalizer folks?

Ah, now I see how you did that. Apparently if I had just studied the output of the ARAX-UI Synonyms tool more closely, I could have avoided a bunch of querying Neo4j endpoints. Thank you!

saramsey commented 1 year ago

This has been traced to an SRI Node Normalizer issue. I've updated the original NCATSTranslator/Feedback issue with the results of our investigation. Seems like nothing left to do here. Yes, biolink:BiologicalEntity is an abstract class but when we roll out KG2.8.6c, this will get fixed).