Open kevinschaper opened 1 year ago
It is important to understand that NodeNormaliser/Babelon (after @gaurav's great presentation last month) does not do any mapping conflict resolution. It is a powerful tool for accessing all mappings for the use case of cross resource discovery. If you want to use the resulting mapping for knowledge graph merging you have to first run it through a reconciliation system like boomer, which may not scale to the enormous size of the NodeNormaliser mapping index.
This issue here is almost certainly a consequences of pulling in oboInOwl:hasDbXref which should never ever be used for knowledge graph merging (but are fine for discovery and search).
This is a very good point, and something that we have struggled with in Babel/NN. There are quite a few cases where both MONDO and HP claim an entity. Alzheimer Disease is another one. At one point I checked, and I think there's something like 100 of these. Now, if you want to, you can look at this as the HP is the phenotype that is the complete phenotype that is caused by and defines the disease. In my opinion, this is better viewed as a mistake on the part of one or the other ontologies. So the choice was made that we would merge these entities and call them diseases.
Since that choice was made, we have implemented the idea of conflation. That's where we say "OK, I know that these two things are really different (say a disease and it's 100% phenotype) but for the purposes of what I'm doing right now, I don't care". We have the ability in NN to conflate or not on the fly. So what I think we really should do is implement a disease / phenotype conflation, much as we have done for gene/protein.
Another example of this is MONDO:0007699/Hashimoto thyroiditis mapping to HP:0000872. This issue was brought to my attention again recently.
I got a question about a Translator QA issue that included this edge as coming from the SRI Reference KG
MONDO:0009477----biolink:has_phenotype----MONDO:0009022
I was surprised to see that the subject was a Disease ID rather than PhenotypicFeature ID, and couldn't find that edge in any version I have of the KG.
I checked node normalizer, and found that https://nodenormalization-sri.renci.org/1.3/get_normalized_nodes?curie=HP%3A0001274 returns:
Which appears to have remapped a
MONDO:0009477----biolink:has_phenotype----HP:0001274
edge that I do have.Even though they're obviously a darn good lexical match, I don't think it's safe to make an equivalence mapping between a
biolink:Disease
and abiolink:PhenotypicFeature
tagging @matentzn to follow the issue, and perhaps correct me!