cthoyt / cthoyt.github.io

My personal website, served at https://cthoyt.com
https://cthoyt.com/
Creative Commons Attribution 4.0 International
3 stars 4 forks source link

write post on the banana problem #58

Open cthoyt opened 8 months ago

cthoyt commented 8 months ago

A local unique identifier is the value within a semantic space. For example, MONDO has the local unique identifier 0005301 for "multiple sclerosis". If you want to make a URI, you take the MONDO URI prefix (http://purl.obolibrary.org/obo/MONDO_) and concatenate the local unique identifier on the end to make a URI (i.e., http://purl.obolibrary.org/obo/MONDO_0005301). Similarly, if you want to make a compact URI (CURIE), you take the MONDO CURIE prefix (MONDO) and concatenate a semicolon : then the local unique identifier (i.e., MONDO:0005301) Unfortunately, there are a lot of places where people mistakenly write a whole CURIE in a place where a local unique identifier should go. This means someone writes MONDO:0005301 where they should have written 0005301. We call this a redundant prefix in the local unique identifier. This is also colloquially called the "banana problem" Wikidata is one place where this happens. Identifiers.org also has propagated this mistake to many places (though MONDO does not appear in Identifiers.org, it might be the case that the submitter for the Wikidata property was influenced by how other properties did it, which were in turn influenced by Identifirs.org) TL;DR, Wikidata has a lot of wrong ways of writing LUIDs in its properties referring to ontologies, MONDO being one example