TranslatorSRI / Babel

Babel creates cliques of equivalent identifiers across many biomedical vocabularies.
MIT License
9 stars 2 forks source link

Implement Disease/Phenotype conflation #44

Open cbizon opened 2 years ago

cbizon commented 2 years ago

We merge diseases and phenotypes when the same term occurred in both MONDO and HP. But this isn't totally correct because diseases are not phenotypes (even if kinda they are). Sometimes for unclear reasons, the mappings don't work out too well (see e.g. asthma).

We should make disease and phenotype another form of conflation and be more careful with it. we can at least partially use MONDO:otherHierarchy to build the conflation tables. The main problem I forsee is when you have MONDO claiming equivalence to (say) a UMLS and HP doing the same, so in that case we'll need to have some kind of rule about what goes where.

cbizon commented 2 years ago

Storing a slack conversation here for future reference:

Chris Bizon: I suspect somebody here knows the answer to this - I was looking at MONDO, and I see that there are some cross references listed with "MONDO:otherHierarchy". For instance linking asthma in MONDO and HP. Does anybody know what that means exactly?

[Sierra Moxon (SRI)] From Chris M: best to file an issue, we should probably not expose these (we want to use the skos mappings in general). From one of the MONDO curators: the x-ref is not in a disease classification (e.g. it is in a phenotype classification). Eg the term is in a branch of an ontology that is not disease (e.g. NCIT or MESH have a lot of branches, not all are diseases) (edited)

[Chris Bizon (SRI, Ranking Agent)] Thanks @Sierra Moxon (SRI). If these were replaced with a skos mapping, which would it be?

[Chris Bizon (SRI, Ranking Agent)] I guess it would have to be something pretty bland

[Sierra Moxon (SRI)] looking thru the MONDO obo file, some of the instances of 'otherHierarchy' also have a "mondo:equivalentTo" source listed. In those cases maybe "skos:exact_match" (the example I looked at, the otherHierarchy NCIT term also had a property_value of "exactMatch" for that NCIT id: see MONDO:0000313). But in the case where it only has the source of otherHierarchy, then I think we would not use this mapping (the example for MONDO:000313 is a xref to an HP term). (edited)

[Sierra Moxon (SRI)] The curator says "one can not say that a disease maps equivalently to a phenotype, but that being said, there are users that might want this mapping and not care about the distinction being that precise."

[Sierra Moxon (SRI)] if we don't care about this distinction, it might be that we just use something very bland as you mention for those that don't have a skos match type... relatedMatch maybe? (edited)

[Chris Bizon (SRI, Ranking Agent)] Agreed... I think that for eg. in NN we want to be more careful than we are now and make this a case for conflation. relatedMatch is probably the right answer from a correctness point of view, but I don't feel like it gets to what is really going on here, or what we would need in order to feed conflation

[Chris Bizon (SRI, Ranking Agent)] Feels like we need some kind of relationship that is "this is the 100% full phenotype for that disease" :white_check_mark: 1

cbizon commented 2 years ago

https://github.com/monarch-initiative/mondo/issues/4794

gaurav commented 2 years ago
gaurav commented 2 years ago

This issue can be relatively low priority UNLESS the change in MONDO has changed how disease/phenotype cliques work.

gaurav commented 1 year ago

Another use-case for getting disease-phenotype conflation working: conflating MONDO:0006805 and HP:0001681 will be essential to getting results from https://github.com/NCATSTranslator/Feedback/issues/410 to be returned correctly.