Closed dhimmel closed 3 years ago
I looked at lowercase xref prefixes and counted the number of occurrences in EFO v3.23.0. This allows us to see all prefixes that are in use and identify multiple prefixes for the same resource.
Here's the table head:
xref_prefix | count |
---|---|
umls | 11879 |
icd10 | 9570 |
omim | 8509 |
ncit | 8395 |
mondo | 8368 |
sctid | 6062 |
doid | 5204 |
mesh | 5175 |
msh | 5019 |
Currently I'm including the following in my SPARQL query to help standardize xref prefixes:
BIND( LCASE(STRBEFORE( ?xref, ":" )) AS ?xref_prefix_dirty )
# Standardize prefixes. https://github.com/EBISPOT/efo/issues/878
BIND(
COALESCE(
# https://blog.semaku.com/post/140876753748/using-coalesce-and-if-in-sparql-for-nested
IF(?xref_prefix_dirty = "msh", "mesh", ?error),
IF(?xref_prefix_dirty = "icd-10", "icd10", ?error),
IF(?xref_prefix_dirty = "umls_cui", "umls", ?error),
# Looked at several of these SNOMEDCT_US terms (US Edition of SNOMED CT) and they existed in the International Edition
IF(?xref_prefix_dirty = "snomedct_us", "snomedct", ?error),
IF(?xref_prefix_dirty = "snomedct_2010_1_31", "snomedct", ?error),
IF(?xref_prefix_dirty = "snomedct_us_2018_03_01", "snomedct", ?error),
?xref_prefix_dirty
) AS ?xref_prefix
)
Would it make sense to bring something like this upstream so all EFO users can benefit from cleaner and more standard xrefs?
Quick note: this is related to https://github.com/EBISPOT/efo/issues/141.
Thank you for pointing these out @dhimmel, we have fixed NCIt, Mesh and SNOMEDCT in EFO. If there are any other broken namespace prefixes, please let us know in a new ticket. I'll now move this to done.
Ideally cross-references (
oboInOwl:hasDbXref
relationships) would use the most standard prefix to identify an external terminology. https://registry.identifiers.org/ is the authority I tend to follow when selecting resource prefixes.Certain cross-references don't use the standard identifier.org prefix. For example, MeSH terms are split between using a prefix of
MeSH
(standard, ignoring capitalization) andMSH
(non-standard).I imagine many of these cross-references are imported from upstream resources. Nonetheless it might make sense for EFO to process the xrefs it imports to standardize their prefixes (and do other clean up like strip whitespace as per https://github.com/EBISPOT/efo/issues/872).
Will follow up with more examples.