freckle / bcp47

BCP-47 for Haskell
https://freckle.github.io/bcp47/
2 stars 1 forks source link

Make BCP47 parsing and Trie lookups case-insensitive #26

Closed cdparks closed 2 years ago

cdparks commented 3 years ago

According to the RFC, language tags and subtags are meant to be handled case-insensitively:

At all times, language tags and their subtags, including private use and extensions, are to be treated as case insensitive: there exist conventions for the capitalization of some of the subtags, but these MUST NOT be taken to carry meaning.

They are also ASCII-only, though they can be encoded in schemes that include ASCII:

Although [RFC5234] refers to octets, the language tags described in this document are sequences of characters from the US-ASCII [ISO646] repertoire. Language tags MAY be used in documents and applications that use other encodings, so long as these encompass the relevant part of the US-ASCII repertoire. An example of this would be an XML document that uses the UTF-16LE [RFC2781] encoding of [Unicode].

This has bitten us recently with our backend failing to accept en-gb. Instead, we should parse tags and subtags without respect to case, and we should also allow lookups to be case-insensitive. Language and Country are already stored in a canonical format, so they don't need special treatment, but all of our subtags were just newtypes around Text. I've added another newtype CIText = CIText { unCIText :: CI Text } and used that instead. This was less noisey than inlining CI Text everywhere.

@eborden I don't fully understand the Trie stuff, but I did add some tests, so let me know if I broke anything.

cdparks commented 3 years ago

This probably needs a major version bump? I changed the type and selector names for exposed newtype constructors, though they were in an Internal module.

cdparks commented 3 years ago

Gonna hold this till @eborden 's back from PTO, curious if I missed anything

pbrisbin commented 2 years ago

@eborden would you be able to push this over the line?

codecov-commenter commented 2 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (main@94dd565). Click here to learn what that means. The diff coverage is n/a.

@@           Coverage Diff           @@
##             main      #26   +/-   ##
=======================================
  Coverage        ?   88.13%           
=======================================
  Files           ?       17           
  Lines           ?      236           
  Branches        ?        1           
=======================================
  Hits            ?      208           
  Misses          ?       28           
  Partials        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 94dd565...fdd227b. Read the comment docs.

eborden commented 2 years ago

:point_up: this will merge on green.