BrianHicks / elm-string-graphemes

Do string operations based on graphemes instead of codepoints or bytes.
https://package.elm-lang.org/packages/BrianHicks/elm-string-graphemes/latest/
BSD 3-Clause "New" or "Revised" License
24 stars 1 forks source link

toLower/toUpper are tricky to define #1

Open drathier opened 5 years ago

drathier commented 5 years ago

String.toUpper only handles a-z. This breaks Swedish, for example. åsa gets uppercased to åSA instead of ÅSA, assuming the å character is a single code point.

Unfortunately, which lowercase characters correspond to what uppercase characters is locale-dependent, so implementing a non-ascii version is tricky. https://stackoverflow.com/questions/12537377/in-haskell-how-can-i-uppercase-a-unicode-character-with-respect-to-current-local

I'd suggest dropping these functions and using css (text-transform: uppercase) to uppercase/lowercase written strings, hoping that the browser locale is correct most of the time. For case-insensitive string comparisons, there's unicode normalization algorithms which can be used.

BrianHicks commented 5 years ago

🤔 this is something we've inherited by making a façade over String, and in fact we're passing directly through to String.

I see two ways forward:

  1. remove the functions, as you suggest (and probably roll these fixes into the next major version)
  2. add a note about this to the docstring
BrianHicks commented 5 years ago

@drathier would you mind moving this issue to elm/core? I don't think we're going to fix it here.