TranslatorSRI / Babel

Babel creates cliques of equivalent identifiers across many biomedical vocabularies.
MIT License
9 stars 2 forks source link

Add Mammalian Phenotype Ontology #300

Open gaurav opened 4 months ago

gaurav commented 4 months ago

This PR adds terms from the Mammalian Phenotype Ontology as requested by CAM-KP (#240). I added all the MP identifiers as well as mappings from https://github.com/mapping-commons/mh_mapping_initiative to connect it to HP and other identifiers. I added sssom as an explicit prerequisite so we can use it to read the SSSOM files in that GitHub repo, but that caused a lot of our other prerequisites to change (hence all the changes to requirements.lock).

We end up with 13,335 cliques that consist only of an MP identifier and 588 cliques that combine MP identifiers with other identifiers. We have no cases where an MP: identifier is chosen over other identifiers; the clique leaders we generate are:

  26 MESH
8498 EFO
13335 MP
16014 HP
19843 NCIT
315066 UMLS

There are a bunch of mapping issues, such as:

Ordinarily I would be nervous about including MP without more/better mappings, but since this isn't going to affect autocomplete (where we specifically filter to MONDO|HP) and that MP identifiers aren't (currently) clique leaders, I think we can merge this in now and then fix cliquing issues if anybody runs into issues with them (for now, probably only CAM-KP).

Closes #240.

WIP: see how often this new information merges cliques in ways we don't expect.

Should be merged after PR #279.