Closed kindaro closed 3 years ago
Does /usr/lib/locale/locale-archive exist? I wonder if arch has some custom locale location. The difference between how Nix and Stack works here is that Nix uses a Nix-built Libc while Stack uses the system's Libc.
You can also do export LANG=C.UTF-8
to get non-localized unicode support though. See https://github.com/NixOS/nixpkgs/pull/58009 and https://github.com/NixOS/nixpkgs/pull/61202 for info on that
Yes, this file does exist.
If I set LANG
as you say, something strange happens. I cannot quite explain, let me rather show.
% export LANG=C.UTF-8
% ghc -e 'putStrLn "<ce><bb>"'
λ
% stack ghc -- -e 'putStrLn "<ce><bb>"'
<interactive>:0:11: error:
lexical error in string/character literal at character '\56526'
% export LANG=en_US.UTF-8
% ghc -e 'putStrLn "λ"'
<interactive>:0:11: error:
lexical error in string/character literal at character '\56526'
% stack ghc -- -e 'putStrLn "λ"'
λ
So, whenever neither Zsh not Stack can do anything with Unicode, ghc
can. And the other way around. (I have no idea why Zsh cannot deal with C.UTF-8
, but that is a whole other question.)
To clarify:
ghc
is installed via Nix.stack ghc
runs another ghc
that is installed via Stack installed via Nix.I can live with one terminal set to C.UTF-8
and another to en_US.UTF-8
, but it can hardly be called life.
So I gather the bug is in the libc
? How can I diagnose it further?
We are running into this at work.
When running ghc
on NixOS, it correctly determines that the encoding should be UTF-8. However, when running ghc
on Ubuntu, it incorrectly thinks the encoding should be ASCII.
Here is an example of running it on NixOS:
$ which locale
/run/current-system/sw/bin/locale
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
$ nix-shell -p ghc --command 'which locale'
/nix/store/22h3f311fjymkvp683kb657jycs7i5pn-glibc-2.27-bin/bin/locale
$ nix-shell -p ghc --command 'locale'
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
$ nix-shell -p ghc --command ghci
> import System.IO
> System.IO.localeEncoding
UTF-8
> import GHC.IO.Encoding.Iconv
> GHC.IO.Encoding.Iconv.localeEncodingName
"UTF-8"
Here is what happens on Ubuntu:
$ which locale
/usr/bin/locale
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=ja_JP.UTF-8
LC_TIME=ja_JP.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=ja_JP.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=ja_JP.UTF-8
LC_NAME=ja_JP.UTF-8
LC_ADDRESS=ja_JP.UTF-8
LC_TELEPHONE=ja_JP.UTF-8
LC_MEASUREMENT=ja_JP.UTF-8
LC_IDENTIFICATION=ja_JP.UTF-8
LC_ALL=
$ nix-shell -p ghc --command 'which locale'
/nix/store/rjsymbdxlwmfbpasn0jik1w97wgfk3qj-glibc-2.27-bin/bin/locale
$ nix-shell -p ghc --command 'locale'
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=ja_JP.UTF-8
LC_TIME=ja_JP.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=ja_JP.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=ja_JP.UTF-8
LC_NAME=ja_JP.UTF-8
LC_ADDRESS=ja_JP.UTF-8
LC_TELEPHONE=ja_JP.UTF-8
LC_MEASUREMENT=ja_JP.UTF-8
LC_IDENTIFICATION=ja_JP.UTF-8
LC_ALL=
$ nix-shell -p ghc --command 'ghci'
> import System.IO
> System.IO.localeEncoding
ASCII
$ nix-shell -p ghc --command 'env LC_ALL=C.UTF-8 ghci'
> import System.IO
> System.IO.localeEncoding
UTF-8
As above, you can see that explicitly setting LC_ALL=C.UTF-8
, GHC picks up the encoding correctly. However, be aware that there seems to be some weirdness with locales, and locales you may think exist do not actually exist. On Ubuntu again:
$ nix-shell -p ghc --command 'env LC_ALL=C.UTF-8 ghci'
/nix/store/cinw572b38aln37glr0zb8lxwrgaffl4-bash-4.4-p23/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
> import System.IO
> System.IO.localeEncoding
ASCII
I haven't done a whole lot of testing of this, but this problem appears to have come about recently.
Here's an old GHC (from 19.03) on Ubuntu again. You can see this appears to be working correctly:
$ cat /nix/var/nix/profiles/per-user/root/channels/nixpkgs/.version
19.03
$ cat /nix/var/nix/profiles/per-user/root/channels/nixpkgs/.git-revision
c2b8270fb8789af290da3f11bd6174a0ba7698f1
$ NIX_PATH=nixpkgs=/nix/var/nix/profiles/per-user/root/channels/nixpkgs nix-shell -p ghc --command 'ghci --version'
The Glorious Glasgow Haskell Compilation System, version 8.6.3
$ NIX_PATH=nixpkgs=/nix/var/nix/profiles/per-user/root/channels/nixpkgs nix-shell -p ghc --command 'ghci --version'
> import System.IO
> System.IO.localeEncoding
UTF-8
Here's a new GHC from the current nixpkgs-unstable channel, again on Ubuntu. This appears to be not working without explicitly setting LC_ALL=C.UTF-8
:
$ cat ~/.nix-defexpr/channels/nixpkgs/.version
20.03
$ cat ~/.nix-defexpr/channels/nixpkgs/.git-revsion
895874d2145862249df3f78335f4dcf62ef01626
$ NIX_PATH=nixpkgs=$HOME/.nix-defexpr/channels/nixpkgs nix-shell -p ghc --command 'ghci --version'
The Glorious Glasgow Haskell Compilation System, version 8.6.5
$ NIX_PATH=nixpkgs=$HOME/.nix-defexpr/channels/nixpkgs nix-shell -p ghc --command 'ghci'
> import System.IO
> System.IO.localeEncoding
ASCII
$ NIX_PATH=nixpkgs=$HOME/.nix-defexpr/channels/nixpkgs nix-shell -p ghc --command 'env LC_ALL=C.UTF-8 ghci'
> import System.IO
> System.IO.localeEncoding
UTF-8
Basically, if someone is willing to do a git bisect between c2b8270fb8789af290da3f11bd6174a0ba7698f1 (known-working) and 895874d2145862249df3f78335f4dcf62ef01626 (known-failing), we might be able to figure out what is the problem here.
I might do this.
Also, just in case you're curious, here is an explanation of text encoding stuff for GHC:
https://www.stackage.org/haddock/lts-7.14/base-4.9.0.0/System-IO.html#g:23
Here's the localeEncoding
function I use above:
https://www.stackage.org/haddock/lts-7.14/base-4.9.0.0/System-IO.html#v:localeEncoding
Under the hood, this appears to be using iconv:
https://www.stackage.org/haddock/lts-7.14/base-4.9.0.0/src/GHC-IO-Encoding-Iconv.html
If I were to try to bisect this, I'd look for some change in how glibc or iconv is being handled that has occurred in the past couple months. Or maybe even some direct change to ghc.
I think I figured out what is going on here.
Here's the explanation from the manual:
https://nixos.org/nixpkgs/manual/#locales
To allow simultaneous use of packages linked against different versions of glibc with different locale archive formats Nixpkgs patches glibc to rely on LOCALE_ARCHIVE environment variable.
On non-NixOS distributions this variable is obviously not set. This can cause regressions in language support or even crashes in some Nixpkgs-provided programs. The simplest way to mitigate this problem is exporting the LOCALE_ARCHIVE variable pointing to ${glibcLocales}/lib/locale/locale-archive. The drawback (and the reason this is not the default) is the relatively large (a hundred MiB) size of the full set of locales. It is possible to build a custom set of locales by overriding parameters allLocales and locales of the package.
My guess as to what is happening is as follows:
On Ubuntu, with older versions of nixpkgs, there was no locale archive provided by default, so GHC (really, iconv) falls back to the system locale archive in /usr/lib/locale/locale-archive
. The system locale archive has support for many different locales by default. With newer versions of nixpkgs, there is a locale archive provided by default, so GHC (really, iconv) uses it. However, it is very small and only has support for the C.UTF-8
locale.
On NixOS, with older versions of nixpkgs, there is a locale archive hardcoded somewhere with a bunch of locales provided by default. With new versions of nixpkgs, NixOS explicitly sets the LOCALE_ARCHIVE
env var pointing to somewhere with a bunch of locales available.
(I figured this out by running locale
under strace
, so it is possible it is not quite correct.)
So @kindaro, the solution to your problem is to do one of the following things:
LOCALE_ARCHIVE
to point to either ${glibcLocales}/lib/locale/locale-archive
or your system locale archive at /usr/lib/locale/locale-archive
(if you want to live dangerously).LC_ALL=C.UTF-8
before you run GHC.@matthewbauer Is this explanation about right?
@cdepillabout Awesome research, thank you. Setting LOCALE_ARCHIVE
works, and it is a better solution than resetting LC_ALL
because it does not affect other installations of ghc
, such as stack
's.
Thank you for your contributions. This has been automatically marked as stale because it has had no activity for 180 days. If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity. Here are suggestions that might help resolve this more quickly:
I'm going to go ahead and close this, since it seems to be "working as intended", and I listed some workarounds in https://github.com/NixOS/nixpkgs/issues/64603#issuecomment-551419489.
FYI: I think one can make the program locale independent by calling setLocaleEncoding utf8
. (compare https://hackage.haskell.org/package/base-4.19.0.0/docs/GHC-IO-Encoding.html#v:setLocaleEncoding)
Issue description
I installed
ghc
via Nix and also via Stack. With theghc
binaries provided by Stack, I can easily do Unicode IO. But the Nixghc
errors out and pretends to know nothing about Unicode.Steps to reproduce
Technical details
"x86_64-linux"
Linux 5.1.16-arch1-1-ARCH, Arch Linux, noversion
yes
yes
nix-env (Nix) 2.2.2
"nixpkgs-19.09pre184803.d567c486ca5"
/home/kindaro/.nix-defexpr/channels/nixpkgs