Closed rdb closed 4 years ago
I'm happy to standardize on the conventional capitalization form as it aids readability. Otherwise, I think we should force lowercase to avoid the "I'm not sure if it matters" conundrum.
I don't care which way we swing on this, but I suspect that enforcing mixed case is going to be harder than enforcing lower case (or upper case, although that would be obtuse).
I prefer canonical (mixed) case, because:
Also, if we force a particular casing, we are already ignoring one rule from BCP 47. If we then go for a casing that is different from the form that is "RECOMMENDED" by the standard, we are essentially ignoring a second one.
Could we call our version BCP48 to avoid confusion? Ok, maybe not, let's do it your way.
Agreed to NOT enforce capitalization–but we do want to recommend that the recommended capitalization scheme be used.
I'm OK with that, as long as we can have at least one document in our validation suite that has funky-cased tags, to make sure people don't rely on the "recommended" behaviour.
The Registry will probably end up canonicalising any tags that pass through it.
This appears to be a no-op, ie what we have in the schema now is good, so let's close this.
Sorry, we still need to document it, although I'm not even sure about that since we're following the spec.
I think this covered by https://github.com/bible-technology/scripture-burrito/blob/develop/schema/common.schema.json#L52.
In #160, I added in a regex that more completely validates BCP 47. It is, however, case insensitive, following section 2.2.1 of BCP 47:
Despite this, I would personally favour being a little stricter and requiring tags to be given in their "canonical" form, and validate that each subtag type is cased appropriately. I think that requiring readers to be able to deal with language codes in a case-insensitive manner might invite bugs (since the vast majority of writers will probably write them in canonical form, and non-canonical ones will be rare in practice) and add an extra implementational burden.
The canonical form, which we are using today in our systems, looks like this:
Thoughts?