TranslatorSRI / NodeNormalization

Service that produces Translator compliant nodes given a curie
MIT License
9 stars 6 forks source link

HP:0001274 incorrectly mapped to MONDO:0009022 #169

Open kevinschaper opened 1 year ago

kevinschaper commented 1 year ago

I got a question about a Translator QA issue that included this edge as coming from the SRI Reference KG

MONDO:0009477----biolink:has_phenotype----MONDO:0009022

I was surprised to see that the subject was a Disease ID rather than PhenotypicFeature ID, and couldn't find that edge in any version I have of the KG.

I checked node normalizer, and found that https://nodenormalization-sri.renci.org/1.3/get_normalized_nodes?curie=HP%3A0001274 returns:

{
  "HP:0001274": {
    "id": {
      "identifier": "MONDO:0009022",
      "label": "corpus callosum, agenesis of"
    },
    "equivalent_identifiers": [
      {
        "identifier": "MONDO:0009022",
        "label": "corpus callosum, agenesis of"
      },
      {
        "identifier": "OMIM:217990"
      },
      {
        "identifier": "UMLS:C0175754",
        "label": "Agenesis of corpus callosum"
      },
      {
        "identifier": "MESH:D061085",
        "label": "Agenesis of Corpus Callosum"
      },
      {
        "identifier": "MEDDRA:10063756"
      },
      {
        "identifier": "NCIT:C98905",
        "label": "Corpus Callosum Agenesis"
      },
      {
        "identifier": "SNOMEDCT:5102002"
      },
      {
        "identifier": "HP:0001274",
        "label": "Agenesis of corpus callosum"
      }
    ],
    "type": [
      "biolink:Disease",
      "biolink:DiseaseOrPhenotypicFeature",
      "biolink:ThingWithTaxon",
      "biolink:BiologicalEntity",
      "biolink:NamedThing",
      "biolink:Entity"
    ],
    "information_content": 100
  },
  "": null
}

Which appears to have remapped a MONDO:0009477----biolink:has_phenotype----HP:0001274 edge that I do have.

Even though they're obviously a darn good lexical match, I don't think it's safe to make an equivalence mapping between a biolink:Disease and a biolink:PhenotypicFeature

tagging @matentzn to follow the issue, and perhaps correct me!

matentzn commented 1 year ago

It is important to understand that NodeNormaliser/Babelon (after @gaurav's great presentation last month) does not do any mapping conflict resolution. It is a powerful tool for accessing all mappings for the use case of cross resource discovery. If you want to use the resulting mapping for knowledge graph merging you have to first run it through a reconciliation system like boomer, which may not scale to the enormous size of the NodeNormaliser mapping index.

This issue here is almost certainly a consequences of pulling in oboInOwl:hasDbXref which should never ever be used for knowledge graph merging (but are fine for discovery and search).

cbizon commented 1 year ago

This is a very good point, and something that we have struggled with in Babel/NN. There are quite a few cases where both MONDO and HP claim an entity. Alzheimer Disease is another one. At one point I checked, and I think there's something like 100 of these. Now, if you want to, you can look at this as the HP is the phenotype that is the complete phenotype that is caused by and defines the disease. In my opinion, this is better viewed as a mistake on the part of one or the other ontologies. So the choice was made that we would merge these entities and call them diseases.

Since that choice was made, we have implemented the idea of conflation. That's where we say "OK, I know that these two things are really different (say a disease and it's 100% phenotype) but for the purposes of what I'm doing right now, I don't care". We have the ability in NN to conflate or not on the fly. So what I think we really should do is implement a disease / phenotype conflation, much as we have done for gene/protein.

DnlRKorn commented 2 months ago

Another example of this is MONDO:0007699/Hashimoto thyroiditis mapping to HP:0000872. This issue was brought to my attention again recently.