biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
174 stars 71 forks source link

remove/replace generic id mappings to UMLS? #749

Closed sierra-moxon closed 3 years ago

sierra-moxon commented 3 years ago

Throughout the model, we have mappings like: "UMLSST:phsf" that use broad UMLS categories(?) as a sort of identifier. I'd like to propose removing them as they don't resolve.

Are they critical for the node normalizer, or other downstream components that would be harmed if we removed them? @cbizon @RichardBruskiewich @mbrush

cmungall commented 3 years ago

We should use the T codes

e.g T039 not phsf

Bioportal uses STY, we could ask for this in bioregistry.io http://bioregistry.io/registry/sty -- can you make a ticket? https://github.com/bioregistry/bioregistry/issues

we can make URLs

http://purl.bioontology.org/ontology/STY/T039

but given there is a hierarchy I don't see the value of curating broad/narrows that can be inferred?

cmungall commented 3 years ago

mappings for codes like phsf -> T codes here: https://metamap.nlm.nih.gov/Docs/SemanticTypes_2018AB.txt

cmungall commented 3 years ago

Request to add STY: https://github.com/bioregistry/bioregistry/issues/83

We should just go ahead and add STY in the biolink yaml header for now

cthoyt commented 3 years ago

I've added STY to the Bioregistry and made a new deployment. Here's the new entry: https://bioregistry.io/registry/sty. Please let me know if there's anything else I can add

cmungall commented 3 years ago

Thanks @cthoyt!

OK, my PR fixes things such that we use STY for semantic types. I also got rid of UMLSST as it was redundant

We still have this:

UMLSSG: 'https://metamap.nlm.nih.gov/Docs/SemGroups_2018.txt/group#'

but this 404s

how about https://lhncbc.nlm.nih.gov/semanticnetwork/SemanticNetworkArchive.html

which points to https://lhncbc.nlm.nih.gov/semanticnetwork/download/sg_archive/SemGroups-v04.txt