TranslatorSRI / NodeNormalization

Service that produces Translator compliant nodes given a curie
MIT License
9 stars 6 forks source link

Missing OMIM/DOID #64

Open cbizon opened 3 years ago

cbizon commented 3 years ago

IN the new version of normalization, there are about 25 identifiers that are not showing up in the final output for disease/phenotpyic feature.

DOID:0060320
DOID:0060321
DOID:0111946
DOID:11086
DOID:11282
DOID:11772
DOID:12143
DOID:1283
DOID:1392
DOID:14070
DOID:1607
DOID:1920
DOID:2058
DOID:4377
DOID:4379
DOID:9341
OMIM:114580
OMIM:177800
OMIM:212050
OMIM:607644
OMIM:613108
OMIM:613956
OMIM:614162
OMIM:615527
OMIM:616445

This happens as follows: during merge, these get glommed together with an HP. In many cases, this seems entirely legitimate.

During type assignment, this gets turned into a phenotypic feature (no mondo + HP)...

During output, it is recognized that OMIM, DOID are not valid identifier prefixes for phenotypes and they are stripped out.


Fundamentally, this represents differing views across vocabularies on the type of a disease/phenotype. DOID really think that hypoglycemic coma is a disease, even if everybody else says it's a phenotype.

--

Options:

  1. Update phenotypic feature in the biolink model to include DOID and OMIM as valid prefixes. This is basically saying that we think that DOID and OMIM have misclassified these entities.

  2. Make DOID / OMIM + HP come out as Disease nodes. This is saying that we think uniformly that HP (and possibly UMLS, etc) have misclassified them.

  3. As is, we say that these are incompatible, forget about them.

cbizon commented 2 years ago

It may be worth holding off until we understand conflation better.