TranslatorSRI / Babel

Babel creates cliques of equivalent identifiers across many biomedical vocabularies.
MIT License
9 stars 2 forks source link

Some MONDO cliques are unexpectedly not cliqued with other identifiers #369

Open gaurav opened 3 weeks ago

gaurav commented 3 weeks ago

Three examples from #360 (originally reported at https://github.com/TranslatorSRI/Babel/issues/333):

All three of these xrefs are present in both the MONDO ontology and in our concords, so it's not clear why it didn't make it into the clique. For these three examples, I created a manual concord which appears to be working correctly.

gaurav commented 3 weeks ago

Example from 2024aug18:

$ pwd
/projects/babel/babel-outputs/2024aug18/compendia
$ grep MONDO:0015517 Disease.txt 
{"type": "biolink:Disease", "ic": "80.9446391445248", "identifiers": [{"i": "MONDO:0015517", "l": "common variable immunodeficiency", "d": ["Common variable immunodeficiency (CVID) comprises a heterogeneous group of diseases characterized by a significant hypogammaglobulinemia of unknown cause, failure to produce specific antibodies after immunizations and susceptibility to bacterial infections, predominantly caused by encapsulated bacteria."], "t": []}], "preferred_name": "common variable immunodeficiency", "taxa": []}
$ grep DOID:12177 Disease.txt 
{"type": "biolink:Disease", "ic": null, "identifiers": [{"i": "DOID:12177", "l": "common variable immunodeficiency", "d": [], "t": []}, {"i": "OMIM:607594", "d": [], "t": []}, {"i": "UMLS:C0009447", "l": "Common Variable Immunodeficiency", "d": [], "t": []}, {"i": "UMLS:C3149378", "l": "IMMUNODEFICIENCY, COMMON VARIABLE, 1", "d": [], "t": []}, {"i": "MESH:D017074", "l": "Common Variable Immunodeficiency", "d": [], "t": []}, {"i": "MEDDRA:10010112", "d": [], "t": []}, {"i": "MEDDRA:10021449", "d": [], "t": []}, {"i": "MEDDRA:10036670", "d": [], "t": []}, {"i": "MEDDRA:10036671", "d": [], "t": []}, {"i": "SNOMEDCT:191010004", "d": [], "t": []}, {"i": "ICD10:D83", "d": [], "t": []}, {"i": "ICD9:279.06", "d": [], "t": []}], "preferred_name": "common variable immunodeficiency", "taxa": []}
$ cd ../intermediate/disease/concords/
$ pwd
/projects/babel/babel-outputs/2024aug18/intermediate/disease/concords
$ grep DOID:12177 *
DOID:DOID:12177 xref    GARD:6140
DOID:DOID:12177 xref    ICD10:D83
DOID:DOID:12177 xref    ICD9:279.06
DOID:DOID:12177 xref    MESH:D017074
DOID:DOID:12177 xref    MIM:PS607594
DOID:DOID:12177 xref    ORDO:1572
DOID:DOID:12177 xref    SNOMEDCT_US_2023_03_01:191010004
DOID:DOID:12177 xref    UMLS:C0009447
MONDO:MONDO:0015517 oio:exactMatch  DOID:12177
MONDO:MONDO:0015517 oio:exactMatch  DOID:12177
$ grep MONDO:0015517 *
MONDO:MONDO:0015517 oio:exactMatch  OMIM.PS:607594
MONDO:MONDO:0015517 oio:exactMatch  NCIT:C26725
MONDO:MONDO:0015517 oio:exactMatch  MESH:D017074
MONDO:MONDO:0015517 oio:exactMatch  SNOMEDCT:23238000
MONDO:MONDO:0015517 oio:exactMatch  DOID:12177
MONDO:MONDO:0015517 oio:exactMatch  MEDGEN:40407
MONDO:MONDO:0015517 oio:exactMatch  UMLS:C0009447
MONDO:MONDO:0015517 oio:exactMatch  orphanet:1572
MONDO:MONDO:0015517 oio:exactMatch  OMIM.PS:607594
MONDO:MONDO:0015517 oio:exactMatch  NCIT:C26725
MONDO:MONDO:0015517 oio:exactMatch  MESH:D017074
MONDO:MONDO:0015517 oio:exactMatch  SNOMEDCT:23238000
MONDO:MONDO:0015517 oio:exactMatch  DOID:12177
MONDO:MONDO:0015517 oio:exactMatch  MEDGEN:40407
MONDO:MONDO:0015517 oio:exactMatch  UMLS:C0009447
MONDO:MONDO:0015517 oio:exactMatch  orphanet:1572
MONDO_close:MONDO:0015517   oio:closeMatch  MEDDRA:10021449
MONDO_close:MONDO:0015517   oio:closeMatch  MEDDRA:10021449
gaurav commented 3 weeks ago

For 2024oct1, I'm not seeing any MONDOs in the existing clique:

{
  "DOID:10017": {
    "id": {
      "identifier": "DOID:10017",
      "label": "multiple endocrine neoplasia type 1"
    },
    "equivalent_identifiers": [
      {
        "identifier": "DOID:10017",
        "label": "multiple endocrine neoplasia type 1"
      },
      {
        "identifier": "OMIM:131100"
      },
      {
        "identifier": "UMLS:C0025267",
        "label": "Multiple Endocrine Neoplasia Type 1"
      },
      {
        "identifier": "UMLS:C3149237",
        "label": "MEN1 SOMATIC MUTATIONS"
      },
      {
        "identifier": "MESH:D018761",
        "label": "Multiple Endocrine Neoplasia Type 1"
      },
      {
        "identifier": "MEDDRA:10026979"
      },
      {
        "identifier": "MEDDRA:10027180"
      },
      {
        "identifier": "MEDDRA:10028190"
      },
      {
        "identifier": "MEDDRA:10028194"
      },
      {
        "identifier": "MEDDRA:10073150"
      },
      {
        "identifier": "NCIT:C3225",
        "label": "Multiple Endocrine Neoplasia Type 1"
      },
      {
        "identifier": "SNOMEDCT:30664006"
      },
      {
        "identifier": "ICD10:E31.21"
      },
      {
        "identifier": "ICD9:258.01"
      }
    ],
    "type": [
      "biolink:Disease",
      "biolink:DiseaseOrPhenotypicFeature",
      "biolink:BiologicalEntity",
      "biolink:ThingWithTaxon",
      "biolink:NamedThing"
    ],
    "information_content": 88.2
  },
  "MONDO:0007540": {
    "id": {
      "identifier": "MONDO:0007540",
      "label": "multiple endocrine neoplasia type 1"
    },
    "equivalent_identifiers": [
      {
        "identifier": "MONDO:0007540",
        "label": "multiple endocrine neoplasia type 1"
      }
    ],
    "type": [
      "biolink:Disease",
      "biolink:DiseaseOrPhenotypicFeature",
      "biolink:BiologicalEntity",
      "biolink:ThingWithTaxon",
      "biolink:NamedThing"
    ],
    "information_content": 100
  }
}