RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

It seems like the labels of some nodes in KG2.5.1 are unreasonable #1243

Closed chunyuma closed 3 years ago

chunyuma commented 3 years ago

Hi @saramsey and @kvarforl, I found that the labels of some nodes in KG2.5.1 are unreasonable. For example, UMLS:C0030660 is categorized as biolink:Disease but it is a pathological process or function, which I think should be categorized as biolink:BiologicalProcess. UMLS:C0014412 is categorized as biolink:Disease but its name is environmental exposure. I think it should not belong to disease.

Do you have any ideas for this?

saramsey commented 3 years ago

For this kind of issue report, it is always helpful to paste in the full Neo4j set of properties for the node:

{
  "iri": "https://identifiers.org/umls:C0030660",
  "category_label": "disease",
  "deprecated": "False",
  "name": "Pathological process",
  "description": "A biologic function or a process having an abnormal or deleterious effect at the subcellular, cellular, multicellular, or organismal level.; The abnormal mechanisms and forms involved in the dysfunctions of tissues and organs.",
  "provided_by": "identifiers_org_registry:umls",
  "id": "UMLS:C0030660",
  "category": "biolink:Disease",
  "update_date": "2017"
}

in many cases that provides a helpful first-step towards debugging the issue.

In this case, it is interesting that no UMLS semantic types are annotated in the description field.

saramsey commented 3 years ago

By inspecting the file kg2-versions.md, I can see that KG2.5.1 was built on the EC2 instance kg2lindsey.rtx.ai; starting it now that I can inspect some relevant build artifacts in the kg2-build subdirectory on that instance....

saramsey commented 3 years ago

I see that, as far as the KG2.5.1 build is concerned, UMLS CUI C0030660 originates from the NCI Thesaurus. There is a declaration in umls-nci.ttl:

<http://purl.bioontology.org/ontology/NCI/C16956> a owl:Class ;
        skos:prefLabel """Pathologic Process"""@en ;
        skos:notation """C16956"""^^xsd:string ;
        skos:definition """A biologic function or a process having an abnormal or deleterious effect at the subcellular, cellular, multicellular, or organismal level."""@en ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C93328> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C106296> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C17747> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C18206> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C96644> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C25751> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C96608> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C19151> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C19131> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C17609> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C37192> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C19686> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C21186> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C25655> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C118511> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C120271> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C20625> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C20741> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C118512> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C19987> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C19387> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C139674> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C153176> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C153404> ;
        <http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C160653> ;
        rdfs:subClassOf <http://purl.bioontology.org/ontology/NCI/C17828> ;
        UMLS:has_cui """C0030660"""^^xsd:string ;
        UMLS:has_tui """T046"""^^xsd:string ;
        UMLS:has_sty <http://purl.bioontology.org/ontology/STY/T046> ;
saramsey commented 3 years ago

So, we can see that in the NCI, the CUI C0030660 is given UMLS semantic type T046.

From the master map of UMLS semantic types we see that UMLS semantic type code T046 is labeled pathologic function:

patf|T046|Pathologic Function

In the Biolink metamodel on line 6145, we see that the UMLS semantic type code UMLSSC:T046 is mapped to Biolink category biolink:Disease:

  disease:
    aliases: ['condition', 'disorder', 'medical condition']
    is_a: disease or phenotypic feature
    exact_mappings:
      - MONDO:0000001
      - DOID:4
      - NCIT:C2991
      - WIKIDATA:Q12136
      - SIO:010299
      # UMLS Semantic Group "Disorders"
      - UMLSSG:DISO
      # Disease or Syndrome
      - UMLSSC:T047
      - UMLSST:dsyn
    narrow_mappings:
      # Congenital Abnormality
      - UMLSSC:T019
      - UMLSST:cgab
      # Acquired Abnormality
      - UMLSSC:T020
      - UMLSST:acab
      # Injury or Poisoning
      - UMLSSC:T037
      - UMLSST:inpo
      # Pathologic Function
      - UMLSSC:T046

permalink here: https://github.com/biolink/biolink-model/blob/097e7f6f0f62d66f22e8cff7c47692a98c837c7a/biolink-model.yaml#L6145

saramsey commented 3 years ago

Now, I suppose that the Biolink people could have mapped UMLS semantic type code UMLSSC:T046 to the biolink category biolink:PathologicalProcess. But let's look at what UMLS has to say on the matter. From the UMLS Reference Manual, Figure 1,

Screen Shot 2021-02-06 at 3 10 28 PM

which seems to bolster the narrow mappings connection betwee UMLSSC:T046 and biolink:Disease.

In any event, the KG2 build code seems to be working as designed. I have put in an issue (629) in the biolink repo to ask Richard B. about whether biolink:PathologicalProcess would be a better mapping for UMLSSC:T046.

saramsey commented 3 years ago

For UMLS CUI UMLS:C0014412, here is the relevant data from MeSH (umls-msh.ttl):

<http://purl.bioontology.org/ontology/MSH/D004781> a owl:Class ;
        skos:prefLabel """Environmental Exposure"""@en ;
        skos:notation """D004781"""^^xsd:string ;
        skos:altLabel """Environmental Exposures"""@en , """Exposure, Environmental"""@en , """Exposures, Environmental"""@en ;
        skos:definition """The exposure to potentially harmful chemical, physical, or biological agents in the environment or to environmental factors that may include ionizing radiation
, pathogenic organisms, or toxic chemicals."""@en ;
        rdfs:subClassOf <http://purl.bioontology.org/ontology/MSH/D004787> ;
        <http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D057219> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000009> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000032> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000145> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000191> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000266> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000331> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000517> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000592> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000706> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000941> ;
        <http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D059392> ;
        <http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D000072739> ;
        <http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D000076764> ;
        <http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D016273> ;
        <http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D018777> ;
        <http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D018876> ;
        <http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D018923> ;
        <http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D004785> ;
        <http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D001822> ;
        <http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D014876> ;
        <http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D005506> ;
        <http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D052918> ;
        <http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D009622> ;
        <http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D014866> ;
        <http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D000397> ;
        <http://purl.bioontology.org/ontology/MSH/MN> """N06.850.460.350"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TH> """NLM (1967)"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/MMR> """20130708"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T014611"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TH> """NLM (1996)"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/AN> """coordinate with specific substance"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/AQL> """AE AN CL EC ES HI LJ PC SN ST"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/DC> """1"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/DX> """19740101"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/FX> """D018876"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/FX> """D018923"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/FX> """D018777"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/FX> """D016273"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/HN> """74(67)"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/MDA> """19990101"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/PM> """74"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T014612"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T014611"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T014611"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/FX> """D004785"""^^xsd:string ;
        UMLS:has_cui """C0014412"""^^xsd:string ;
        UMLS:has_tui """T037"""^^xsd:string ;
        UMLS:has_sty <http://purl.bioontology.org/ontology/STY/T037> ;
saramsey commented 3 years ago

So, UMLS CUI C0014412 (environmental exposure) has UMLS semantic type T037 (injury or poisoning). In the Biolink model, for whatever reason, that semantic type is also mapped to Biolink category disease (see excerpt from biolink-model.yaml, a few comments above; here is the permalink). I have edited Biolink github repo issue 629 to also inquire about T037.

saramsey commented 3 years ago

OK, while we are waiting for official word back from the Biolink team, I have gone ahead and changed curies-to-categories.yaml so that both UMLS categories T037 and T046 are mapped to biolink:PathologicalProcess:

 UMLS_STY:T037: pathological process
 UMLS_STY:T046: pathological process

since that doesn't seem to break validate_curies_to_categories_yaml.py (that validation script uses biolink-model.owl.ttl which does't have external CURIE ID mappings to Biolink categories like biolink-model.yaml does, so it is not currently possible for validate_curies_to_categories_yaml.py to detect that we are introducing (with the two above changes) an inconsistency with the Biolink mappings). I will mark this issue as "blocked" pending review of our request by the Biolink team. If the Biolink team ultimately decides that those two UMLS semantic types really should be mapped to biolink:Disease, then I can always change back the above two edits to the KG2 curies-to-categories.yaml file.

saramsey commented 3 years ago

Thank you @chunyuma for bringing this issue to my attention.

saramsey commented 3 years ago

To clarify for @chunyuma : you won't see the above changes actually show up in KG2 until the 2.5.2 build.

saramsey commented 3 years ago

Tagging @kvarforl so she is aware of the above and can include this fix in the next KG2 build.

chunyuma commented 3 years ago

Thanks so much for your detailed explanation @saramsey. It seems like biolink:PathologicalProcess might be a better mapping than biolink:Disease for UMLS:C0030660 and UMLS:C0014412. But let's see how the Biolink team reply.

I'm curious about these two nodes because I'm modifying KG2.5.1C locally for training an explainable Drug-treats-disease model. I just try to integrate mode biological domain knowledge into the graph and make the graph more reasonable so that the model can better understand the underlying biological knowledge.

saramsey commented 3 years ago

Yes, I agree with you. In the next build, those two CUIs should be re-annotated as biolink:PathologicalProcess. I will mark this issue also as "verify in next build".

kvarforl commented 3 years ago

looks better in kg2.5.2

match (n) where n.id in ["UMLS:C0030660","UMLS:C0014412"] return n.id, n.category
image