Open despresc opened 3 years ago
Granted, this would represent a pretty significant rewrite of the package. If there is no appetite for that, I may write one that provides full BCP47 coverage.
Sorry for the delayed reply! For some reason I wasn't notified of this Issue. :thinking:
@eborden, do you have any thoughts here?
@despresc I certainly see no problem with continuing to expand coverage of this library. Some decisions such as leveraging ISO639_1
from the iso639
package or Country
from the country
package were made for expediency and not full compliance. I'd actually love to see those packages improved to be in greater compliance, but I'd gladly welcome pull requests to this repository to continue expanding coverage.
Thanks for the reply! Actually, between opening this issue and your reply, I started work on my own package. I'm not sure how well it can be ported over, since it happens to represent BCP47 tags and subtags differently than how they are in this package, and the parsing and analysis flow is also a bit different. I think I may just continue working on it, since it's fairly well developed at this point. Sorry for the duplicated effort.
@pbrisbin @eborden this has actually bitten us now - our region parser is overly case-sensitive. We accept en-GB
(correctly) but not en-gb
(incorrectly)
@despresc if you have time, can you verify that I've characterized these issues correctly in this comment? See also these pending tests.
Yes, the es-419
and en-TP
issues stem from country
, which encodes a distinct but overlapping set of country codes from those in the IANA registry.
The cmn
issue is due to iso639
, which doesn't support the other standards in the ISO 639 series. I should mention that the primary langauge subtags are currently a strict subset of the ISO 639 codes, from what I recall. The registry doesn't contain every ISO 639-3 code, at least.
Are there plans for greater BCP47 compatibility? The tags
es-419
,en-gb
,en-TP
, andcmn
, for instance, are not recognized by the library. That last tag is even the preferred encoding of Mandarin Chinese (over the currently-recognizedzh-guoyu
, and others) according to the IANA registry.The
Region
component in particular seems odd, since it will fail to recognize any of the three digit region codes (like 419) registered with the IANA, but it will recognize tags with three digit country codes that already have two letter codes, like inzh-012
. (Such codes must not be used in tags, according to https://tools.ietf.org/html/rfc5646#section-2.2.4, item 4.D).There is also a small issue that may be a problem in the future, though it is probably unlikely: the primary language tag is not defined to include all of ISO 639-1. The IANA will not register any future ISO 639-1 code that is already covered by a three letter code (see https://tools.ietf.org/html/rfc5646#page-11), so this library would have to switch away from the
iso639
package in that event (assuming that package is updated to include the code) to remain correct.