Open cthoyt opened 3 years ago
These look great to me! Why do you think the "New prefixes must start with a letter." constraint is necessary? There are some resources whose name happens to start with a number and in those cases I think it's fine if the prefix does too. In terms of removal of prefixes, if there is no way to resolve a prefix since the underlying resource is unavailable, is there a way to curate that status?
Why do you think the "New prefixes must start with a letter." constraint is necessary? There are some resources whose name happens to start with a number and in those cases I think it's fine if the prefix does too.
@bgyori you're right, there are already three prefixes that start with a number:
I'm not completely against allowing numbers. I was thinking in programming languages, starting with a number wouldn't be a valid identifier, but prefixes don't really have the same semantics. However, I think it should remain that a prefix can not start with a dot.
On the other hand, I don't think there's an issue with imposing new standards on future prefixes.
In terms of removal of prefixes, if there is no way to resolve a prefix since the underlying resource is unavailable, is there a way to curate that status?
There is a field deprecated
in that's optional for all prefixes. I guess I should make an enumeration of optional fields and what they're for to include in the policy here as well
I agree with all of that, thanks!
Multiple prefixes will not be issued for multiple versions of a resource
What about ICD9 / ICD10 / ICD11 where multiple prefixes make sense because no identifiers are shared between releases? Does this policy need more nuance on what version refers to?
Multiple prefixes will not be issued for multiple versions of a resource
What about ICD9 / ICD10 / ICD11 where multiple prefixes make sense because no identifiers are shared between releases? Does this policy need more nuance on what version refers to?
Excellent point - I would go as far as to say ICD9, ICD10, and ICD11 are completely different resources. However, based on the name, it's not fair to assume this is obvious unless people have domain knowledge. What do you think we should call the difference between the ICD's and just different releases of the same database?
Even further, the fact that ICD by itself refers to ICD10 is a bad thing and should be discouraged. This is a remnant of Identifiers.org's curation choices, but also is pretty consistent throughout resources like EFO and DOID
What do you think we should call the difference between the ICD's and just different releases of the same database?
Maybe "disjoint releases" meaning that no identifiers are shared across releases and then give the example of ICDs.
the fact that ICD by itself refers to ICD10 is a bad thing and should be discouraged
Agree. Bound to cause big problems when ICD-11 adoption increases.
- New prefixes must validate against the following regular expression:
^[a-z][a-z0-9]+(\.[a-z][a-z0-9]+?)$
That pattern is broken. The ?
for the optional subspace needs to be moved outside the group (so it applies to the whole group). The .
needs to be escaped. How about ^[a-z][a-z0-9]+(\.[a-z][a-z0-9]+)?$
?
This is a placeholder issue to discuss prefix orthography and typography, which are fancy words describing how prefixes should look and feel.
Minimum Prefix Requirements
[a-z]
, numbers[0-9]
, and a single dot.
if a subspace is requested.^[a-z][a-z0-9]+(\.[a-z][a-z0-9]+?)$
gene
,covid
) and a list of prefixes that would cause bugs in the web service (e.g.,overview
,registry
,downloads
)Subnamespacing
See #133
Handling Collisions
Removal of Prefixes
Typically, prefixes should not be removed from the Bioregistry, even if they correspond to subsumed, abandoned, or dead resources, because it is also a historical archive and reference for anyone who might run into legacy prefixes in legacy resources. It has not happened often that prefixes have even collided. One example is gene expression omnibus vs. geographical entity ontology, which are both maintained. Another example is the disease class annotation (legacy classification from the Disease Ontology) and Dublin Core, where one is obviously more important than the other.
Choosing a Good Prefix
mesh.2012
andmesh.2013
prefix registered in Identifiers.org was a huge mistake and causes massive confusion)gene
orchemical
. Reviewers will use their best judgement since it's hard to list all possible generic entity types. For example,gene
would be bad whilehgnc.gene
would be better.chebi.chemical
would be bad whilechebi
would be better.See also the OBO Foundry policy on choosing a prefix (i.e., IDSPACE) at http://obofoundry.org/id-policy.html
Who can request a prefix
Minimum Metadata Requirements
The required fields are maintained in the new prefix issue template including the following:
Optional Metadata
The PyDantic model
bioregistry.schema.struct.Resource
lists most fields as optional because it's relying on dispatch to other externally mapped data from identifiers.org, OBO foundry, etc..TODO text descriptions of the rest