Closed chunyuma closed 3 years ago
For this kind of issue report, it is always helpful to paste in the full Neo4j set of properties for the node:
{
"iri": "https://identifiers.org/umls:C0030660",
"category_label": "disease",
"deprecated": "False",
"name": "Pathological process",
"description": "A biologic function or a process having an abnormal or deleterious effect at the subcellular, cellular, multicellular, or organismal level.; The abnormal mechanisms and forms involved in the dysfunctions of tissues and organs.",
"provided_by": "identifiers_org_registry:umls",
"id": "UMLS:C0030660",
"category": "biolink:Disease",
"update_date": "2017"
}
in many cases that provides a helpful first-step towards debugging the issue.
In this case, it is interesting that no UMLS semantic types are annotated in the description
field.
By inspecting the file kg2-versions.md, I can see that KG2.5.1 was built on the EC2 instance kg2lindsey.rtx.ai
; starting it now that I can inspect some relevant build artifacts in the kg2-build
subdirectory on that instance....
I see that, as far as the KG2.5.1 build is concerned, UMLS CUI C0030660
originates from the NCI Thesaurus. There is a declaration in umls-nci.ttl
:
<http://purl.bioontology.org/ontology/NCI/C16956> a owl:Class ;
skos:prefLabel """Pathologic Process"""@en ;
skos:notation """C16956"""^^xsd:string ;
skos:definition """A biologic function or a process having an abnormal or deleterious effect at the subcellular, cellular, multicellular, or organismal level."""@en ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C93328> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C106296> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C17747> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C18206> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C96644> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C25751> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C96608> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C19151> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C19131> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C17609> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C37192> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C19686> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C21186> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C25655> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C118511> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C120271> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C20625> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C20741> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C118512> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C19987> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C19387> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C139674> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C153176> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C153404> ;
<http://purl.bioontology.org/ontology/NCI/process_initiates_biological_process> <http://purl.bioontology.org/ontology/NCI/C160653> ;
rdfs:subClassOf <http://purl.bioontology.org/ontology/NCI/C17828> ;
UMLS:has_cui """C0030660"""^^xsd:string ;
UMLS:has_tui """T046"""^^xsd:string ;
UMLS:has_sty <http://purl.bioontology.org/ontology/STY/T046> ;
So, we can see that in the NCI, the CUI C0030660
is given UMLS semantic type T046
.
From the master map of UMLS semantic types
we see that UMLS semantic type code T046
is labeled pathologic function
:
patf|T046|Pathologic Function
In the Biolink metamodel on line 6145, we see that the UMLS semantic type code UMLSSC:T046
is mapped to Biolink category biolink:Disease
:
disease:
aliases: ['condition', 'disorder', 'medical condition']
is_a: disease or phenotypic feature
exact_mappings:
- MONDO:0000001
- DOID:4
- NCIT:C2991
- WIKIDATA:Q12136
- SIO:010299
# UMLS Semantic Group "Disorders"
- UMLSSG:DISO
# Disease or Syndrome
- UMLSSC:T047
- UMLSST:dsyn
narrow_mappings:
# Congenital Abnormality
- UMLSSC:T019
- UMLSST:cgab
# Acquired Abnormality
- UMLSSC:T020
- UMLSST:acab
# Injury or Poisoning
- UMLSSC:T037
- UMLSST:inpo
# Pathologic Function
- UMLSSC:T046
permalink here: https://github.com/biolink/biolink-model/blob/097e7f6f0f62d66f22e8cff7c47692a98c837c7a/biolink-model.yaml#L6145
Now, I suppose that the Biolink people could have mapped UMLS semantic type code UMLSSC:T046
to the biolink category biolink:PathologicalProcess
. But let's look at what UMLS has to say on the matter. From the UMLS Reference Manual, Figure 1,
which seems to bolster the narrow mappings
connection betwee UMLSSC:T046
and biolink:Disease
.
In any event, the KG2 build code seems to be working as designed. I have put in an issue (629) in the biolink repo to ask Richard B. about whether biolink:PathologicalProcess
would be a better mapping for UMLSSC:T046
.
For UMLS CUI UMLS:C0014412
, here is the relevant data from MeSH (umls-msh.ttl
):
<http://purl.bioontology.org/ontology/MSH/D004781> a owl:Class ;
skos:prefLabel """Environmental Exposure"""@en ;
skos:notation """D004781"""^^xsd:string ;
skos:altLabel """Environmental Exposures"""@en , """Exposure, Environmental"""@en , """Exposures, Environmental"""@en ;
skos:definition """The exposure to potentially harmful chemical, physical, or biological agents in the environment or to environmental factors that may include ionizing radiation
, pathogenic organisms, or toxic chemicals."""@en ;
rdfs:subClassOf <http://purl.bioontology.org/ontology/MSH/D004787> ;
<http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D057219> ;
<http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000009> ;
<http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000032> ;
<http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000145> ;
<http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000191> ;
<http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000266> ;
<http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000331> ;
<http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000517> ;
<http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000592> ;
<http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000706> ;
<http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000941> ;
<http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D059392> ;
<http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D000072739> ;
<http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D000076764> ;
<http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D016273> ;
<http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D018777> ;
<http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D018876> ;
<http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D018923> ;
<http://purl.bioontology.org/ontology/MSH/RO> <http://purl.bioontology.org/ontology/MSH/D004785> ;
<http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D001822> ;
<http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D014876> ;
<http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D005506> ;
<http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D052918> ;
<http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D009622> ;
<http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D014866> ;
<http://purl.bioontology.org/ontology/MSH/SIB> <http://purl.bioontology.org/ontology/MSH/D000397> ;
<http://purl.bioontology.org/ontology/MSH/MN> """N06.850.460.350"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/TH> """NLM (1967)"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/MMR> """20130708"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/TERMUI> """T014611"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/TH> """NLM (1996)"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/AN> """coordinate with specific substance"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/AQL> """AE AN CL EC ES HI LJ PC SN ST"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/DC> """1"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/DX> """19740101"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/FX> """D018876"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/FX> """D018923"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/FX> """D018777"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/FX> """D016273"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/HN> """74(67)"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/MDA> """19990101"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/PM> """74"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/TERMUI> """T014612"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/TERMUI> """T014611"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/TERMUI> """T014611"""^^xsd:string ;
<http://purl.bioontology.org/ontology/MSH/FX> """D004785"""^^xsd:string ;
UMLS:has_cui """C0014412"""^^xsd:string ;
UMLS:has_tui """T037"""^^xsd:string ;
UMLS:has_sty <http://purl.bioontology.org/ontology/STY/T037> ;
So, UMLS CUI C0014412
(environmental exposure) has UMLS semantic type T037
(injury or poisoning). In the Biolink model, for whatever reason, that semantic type is also mapped to Biolink category disease
(see excerpt from biolink-model.yaml
, a few comments above; here is the permalink). I have edited Biolink github repo issue 629 to also inquire about T037.
OK, while we are waiting for official word back from the Biolink team, I have gone ahead and changed curies-to-categories.yaml
so that both UMLS categories T037
and T046
are mapped to biolink:PathologicalProcess
:
UMLS_STY:T037: pathological process
UMLS_STY:T046: pathological process
since that doesn't seem to break validate_curies_to_categories_yaml.py
(that validation script uses biolink-model.owl.ttl
which does't have external CURIE ID mappings to Biolink categories like biolink-model.yaml
does, so it is not currently possible for validate_curies_to_categories_yaml.py
to detect that we are introducing (with the two above changes) an inconsistency with the Biolink mappings). I will mark this issue as "blocked" pending review of our request by the Biolink team. If the Biolink team ultimately decides that those two UMLS semantic types really should be mapped to biolink:Disease
, then I can always change back the above two edits to the KG2 curies-to-categories.yaml
file.
Thank you @chunyuma for bringing this issue to my attention.
To clarify for @chunyuma : you won't see the above changes actually show up in KG2 until the 2.5.2 build.
Tagging @kvarforl so she is aware of the above and can include this fix in the next KG2 build.
Thanks so much for your detailed explanation @saramsey. It seems like biolink:PathologicalProcess
might be a better mapping than biolink:Disease
for UMLS:C0030660
and UMLS:C0014412
. But let's see how the Biolink team reply.
I'm curious about these two nodes because I'm modifying KG2.5.1C locally for training an explainable Drug-treats-disease model. I just try to integrate mode biological domain knowledge into the graph and make the graph more reasonable so that the model can better understand the underlying biological knowledge.
Yes, I agree with you. In the next build, those two CUIs should be re-annotated as biolink:PathologicalProcess
. I will mark this issue also as "verify in next build".
looks better in kg2.5.2
match (n) where n.id in ["UMLS:C0030660","UMLS:C0014412"] return n.id, n.category
Hi @saramsey and @kvarforl, I found that the labels of some nodes in KG2.5.1 are unreasonable. For example,
UMLS:C0030660
is categorized asbiolink:Disease
but it is a pathological process or function, which I think should be categorized asbiolink:BiologicalProcess
.UMLS:C0014412
is categorized asbiolink:Disease
but its name is environmental exposure. I think it should not belong to disease.Do you have any ideas for this?