NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

NG-NITROARGININE METHYL ESTER is not a gene #753

Open TranslatorIssueCreator opened 2 months ago

TranslatorIssueCreator commented 2 months ago

Type: Bug Report

URL: https://ui.test.transltr.io/main/results?l=Acetylcholine&i=PUBCHEM.COMPOUND:187&t=3&r=0&q=27730ac2-b19e-4edd-b146-7a79551965b0

ARS PK: 27730ac2-b19e-4edd-b146-7a79551965b0

Steps to reproduce:

search for NG-NITROARGININE METHYL ESTER result

Screenshots:

sandrine-muller commented 2 months ago

What gene may be upregulated by acetylcholine? I get a organic compound name with UMLS curie UMLS:C0083536. image The source card though explicitely mention it is an organic chemical while it is returned by the UI as an organic chemical. This is likely that the UMLS entity class returned by SemmedDB is incorrect (to be verified)

sierra-moxon commented 2 months ago

@sandrine-muller - do you know which KP that ingests SEMMEDDB is returning this result? RTX-KG2 or a BTE implementation?

sandrine-muller commented 2 months ago

The reported source is ARAX: image

sierra-moxon commented 1 month ago

@dnsmith124 - does this view (where there are two categories, Protein and Drug, on the left had filter menu that make the result disappear when either of them are "-"'ed from the results) mean that there are two categories on the NG-NITROARGININE METHYL ESTER node (Protein and Drug)?

Screen Shot 2024-05-16 at 4 40 34 PM

@andrewsu - would you be able to tell if the "Drug" category from this node should be removed -- it looks like a SEMMEDDB edge via BTE?

andrewsu commented 1 month ago

So semmeddb is reporting that NG-Nitroarginine Methyl Ester (UMLS:C0083536) is of semantic type Gene or Genome (abbreviation gngm), so it makes sense that this record is returned in response to the original query. One can argue whether this modified amino acid should have this semantic type, of course. But I don't see an easy/obvious way in which this would be fixed by semmeddb cleaning.

Once it is returned, @sierra-moxon points out that the UI shows it of types Drug and Protein. I understand why it's shown as Protein based on the NodeNorm results, but I'm not sure where the Drug classification is coming from....

https://nodenorm.test.transltr.io/get_normalized_nodes?curie=UMLS:C0083536&conflate=true&drug_chemical_conflate=true:

{
    "UMLS:C0083536": {
        "id": {
            "identifier": "UMLS:C0083536",
            "label": "NG-Nitroarginine Methyl Ester"
        },
        "equivalent_identifiers": [
            {
                "identifier": "UMLS:C0083536",
                "label": "NG-Nitroarginine Methyl Ester"
            }
        ],
        "type": [
            "biolink:Protein",
            "biolink:Polypeptide",
            "biolink:BiologicalEntity",
            "biolink:NamedThing",
            "biolink:GeneProductMixin",
            "biolink:ChemicalEntityOrGeneOrGeneProduct",
            "biolink:ChemicalEntityOrProteinOrPolypeptide",
            "biolink:ThingWithTaxon",
            "biolink:GeneOrGeneProduct",
            "biolink:MacromolecularMachineMixin"
        ]
    }
}
dnsmith124 commented 1 month ago

@sierra-moxon @andrewsu The node type facet only checks for the presence of a given biolink type within any of the paths of a result. This result in particular shows up with both Protein and Drug selected not because NG-NITROARGININE has both types, but because the end node of that result, Acetylcholine, is a Drug.

It's also worth noting that the facets function in an OR fashion when considering facets from the same filter type (so multiple Node Type facets selected show all results that apply to either, not just those that apply to both), whereas facets from different filter types function in an AND fashion (requiring true for both filters in order to show a result)