Closed pbrisbin closed 2 years ago
As I understand it, the primary value in enforcing known-ness is to prevent use of unknown values through typo. However,
OK, so it's more than this. Country
can parse the 2-alpha, 3-alpha, and 3-digit values to the same concrete value. So, AF
, AFG
and 004
would all become Afghanistan and compare as the same thing. If we moved to an opaque Region
text, these would be 3 distinct values.
I don't think Language
has that issue, so we could still move to an opaque textual value there and (I think) solve some of #21. However, Country
is the more annoying dependency :thinking:
Language
also has this issue. ISO-639-3 added 3 letter language codes so you could represent English as en
or eng
and those should be equivalent.
Ah yeah, I guess things won't be that easy.
This is related to (and may solve) #21, which describes limitations that we inherit by using
iso639
(for what we callLanguage
) andcountry
(for what we callRegion
). I'm motivated to solve this (and in the way I'll describe) becausecountry
is also no longer in Stackage. Bringing it back into Stackage is high-effort because many of its transitive dependents are also no longer in Stackage -- therefore, I'm interested in cutting this dependency (and solving #21 along the way).As I understand it, the primary use-case for bcp47 is the
Trie
type, which is a map-like structure for looking up some value by a language-region/script/etc key (BCP47 ~ (Language, [Region|...])
). To achieve this functionality as stated, the type we use for the language and region components only needs to be a unique, case-insensitive input in such keys.We are using the above libraries because they maintain enumerations of known codes. We use complete parsers to ensure the input strings are valid, then we map the parsed textual values into these enumerations to ensure they're known. This is a double-edged sword, as you can see with #21; we would need these libraries to be up to date and flexible enough to support the widest range of values possible.
I'm starting to think we just don't need that final step. As I understand it, the primary value in enforcing known-ness is to prevent use of unknown values through typo. However,
I think that our parsers are sufficient to ensure users enter a valid identifier, and we can use it opaquely without enforcing it's known. Therefore, my proposal is to stop doing it.
I have a branch locally that fully encapsulates the library-provided types behind our own (which I think we should do regardless of the outcome of this Issue),
My proposal would be to then change it to,
That is, just allow and use any case-insensitive textual value that passes our parsers.
@eborden WDYT?