Open PedroRegisPOAR opened 3 years ago
To explain, the default glibc package doesn't contain locale data – that's separate and --pure isn't supposed to see OS data. Source: https://github.com/NixOS/nixpkgs/issues/32848#issuecomment-352996633
Adding
glibcLocales
to the shell indeed fixes the issue. Though this raises the question as to whether that dependency needs to be explicitly added to any package that depends on glibc and does some kind of text processing. Locales are a non-optional part of the C standard and while it’s great to be able to drop the heavyweight dependency where you know it’s irrelevant, it should not be absent in the default context. From:
LC_ALL=en_GB.utf8 date '+%c'
LC_ALL=en_US.utf8 date '+%c'
nix \
run \
nixpkgs#python39 \
-- \
-c \
'
import locale
locale.setlocale(locale.LC_ALL, "pt_BR.utf8")
print(locale.currency(12345.67, grouping=True, symbol=True))
'
#
# nix flake metadata nixpkgs --json | jq -r .url
nix \
store \
ls \
--store https://cache.nixos.org/ \
--long \
--recursive \
"$(nix eval --raw github:NixOS/nixpkgs/9eb60f25aff0d2218c848dd4574a0ab5e296cabe#glibcLocales)"
#
# nix flake metadata nixpkgs --json | jq -r .url
nix \
store \
ls \
--store https://cache.nixos.org/ \
--long \
--recursive \
"$(nix eval --raw github:NixOS/nixpkgs/9eb60f25aff0d2218c848dd4574a0ab5e296cabe#glibc)"
#
# nix flake metadata nixpkgs --json | jq -r .url
nix \
store \
ls \
--store https://cache.nixos.org/ \
--long \
--recursive \
"$(nix eval --raw github:NixOS/nixpkgs/9eb60f25aff0d2218c848dd4574a0ab5e296cabe#locale)"
#
# nix flake metadata nixpkgs --json | jq -r .url
nix \
store \
ls \
--store https://cache.nixos.org/ \
--long \
--recursive \
"$(nix eval --raw github:NixOS/nixpkgs/9eb60f25aff0d2218c848dd4574a0ab5e296cabe#glibc.bin)"
# nix flake metadata github:NixOS/nixpkgs/release-22.05 --json
command -v jq >/dev/null || nix profile install github:NixOS/nixpkgs/4aceab3cadf9fef6f70b9f6a9df964218650db0a#jq \
&& nix \
build \
--impure \
--expr \
'(
with builtins.getFlake "nixpkgs";
with legacyPackages.${builtins.currentSystem};
(
glibcLocales.override {
allLocales = false;
locales = [
"en_GB.UTF-8/UTF-8"
"ru_RU.UTF-8/UTF-8"
"en_US.UTF-8/UTF-8"
"pt_BR.UTF-8/UTF-8"
"ja_JP.UTF-8/UTF-8"
"en_IE.UTF-8/UTF-8"
];
}
)
)'
Refs.:
LOCALE_ARCHIVE=result/lib/locale/locale-archive \
&& LC_ALL=pt_BT.UTF-8 \
&& nix \
run \
nixpkgs#python39 \
-- \
-c \
'
import locale
locale.setlocale(locale.LC_ALL, "pt_BR.utf8")
print(locale.currency(12345.67, grouping=True, symbol=True))
locale.setlocale(locale.LC_ALL, "en_US.utf8")
print(locale.currency(12345.67, grouping=True, symbol=True))
locale.setlocale(locale.LC_ALL, "ru_RU.utf8")
print(locale.currency(12345.67, grouping=True, symbol=True))
locale.setlocale(locale.LC_ALL, "ja_JP.utf8")
print(locale.currency(12345.67, grouping=True, symbol=True))
locale.setlocale(locale.LC_ALL, "en_IE.utf8")
print(locale.currency(12345.67, grouping=True, symbol=True))
'
{ cat <<WRAP >> foo.c
#include <stdio.h>
#include <locale.h>
int main()
{
char *locale = setlocale(LC_ALL, "");
printf("\n locale =%s\n", locale);
printf("test\n \u263a\u263b Hello from C\n");
return 0;
}
WRAP
} && gcc foo.c \
&& ./a.out
rm -f a.out foo.c
Refs.:
TODO:
nix run nixpkgs#gcc -- -xc -E -v /dev/null
+
printf '#include <locale.h>\nLC_COLLATE\n' | gcc -E -x c - | tail -n 1
Refs.:
nix run nixpkgs#python39 -- -c "assert '\N{snake}' == '🐍'"
TODO:
printf %b\\n \\u04{51,{3,4}{{0..9},{a..f}}}|sort|sed 's/./\u&&/'|tr -d \\n
Refs.:
nix \
shell \
--ignore-environment \
--impure \
--expr \
'
(
let
nixpkgs = (builtins.getFlake "github:NixOS/nixpkgs/0938d73bb143f4ae037143572f11f4338c7b2d1c");
pkgs = import nixpkgs { };
in
with pkgs; [
cowsay
]
)
' \
--command cowsay "Hello"
nix \
shell \
--ignore-environment \
--impure \
--expr \
'
(
let
nixpkgs = (builtins.getFlake "github:NixOS/nixpkgs/0938d73bb143f4ae037143572f11f4338c7b2d1c");
pkgs = import nixpkgs { };
in
with pkgs; [
(
glibcLocales.override {
allLocales = false;
locales = [
"en_US.UTF-8/UTF-8"
"pt_BR.UTF-8/UTF-8"
];
}
)
cowsay
]
)
' \
--command cowsay "Hello"
Refs.:
LOCALE_ARCHIVE=result/lib/locale/locale-archive
LC_ALL=pt_BR.UTF-8 date '+%c'
LC_ALL=en_US.UTF-8 date '+%c'
LC_ALL=ru_RU.UTF-8 date '+%c'
LC_ALL=ja_JP.UTF-8 date '+%c'
LC_ALL=en_IE.UTF-8 date '+%c'
LOCALE_ARCHIVE=result/lib/locale/locale-archive
LC_ALL=en_GB.UTF-8
nix run nixpkgs#uutils-coreutils -- date '+%c'
LC_ALL=en_US.UTF-8
nix run nixpkgs#uutils-coreutils -- date '+%c'
LOCALE_ARCHIVE=result/lib/locale/locale-archive
LC_ALL=en_GB.UTF-8
nix run nixpkgs#busybox -- date '+%c'
LC_ALL=en_US.UTF-8
nix run nixpkgs#busybox -- date '+%c'
nix \
run \
nixpkgs#python39 -- \
-c \
"
v=32
while v:print('Ёё'*(v==26),end='%c%c'%(1072-v,1104-v));v-=1
"
export LC_ALL=en_US.utf8
nix run nixpkgs#python39 -- -c '
import locale
defaultlocale = locale.getdefaultlocale()
locale.setlocale(locale.LC_ALL, defaultlocale[0] + "." + defaultlocale[1])
print(locale.currency(12345.67, grouping=True, symbol=True))
'
export LC_ALL=pt_BR.utf8
nix run nixpkgs#python39 -- -c '
import locale
defaultlocale = locale.getdefaultlocale()
locale.setlocale(locale.LC_ALL, defaultlocale[0] + "." + defaultlocale[1])
print(locale.currency(12345.67, grouping=True, symbol=True))
'
Refs.:
perl -MEncode=decode -E 'while(<>){ chomp; say length decode("UTF-8", $_) }' <<<'文字化け'
Refs.:
TODO:
Faker("cellphone_number", locale="pt-BR")
TODO:
systemd-escape -u 'Hall\xc3\xb6chen\x2c\x20Meister'
TODO:
At 31:50 you talk about environment variables. However there are some mistakes worth correcting for future viewers. First, although the environment variables are stored in the process' memory, it is stored as zero-terminated strings and not as one big string separated by new-line characters. It is also is not stored on the heap, nor is there a global variable in the data section pointing to it. The environment is actually stored entirely on the stack and is a part of the initial process stack that is set up before the program starts running. The first value on the stack is the argument count followed by an array of the addresses of the different arguments, then address 0 marking the end of the argument array. Right after that there is a second array of addresses which each point to a zero-terminated string which would be the environment variables, this array is also terminated by having address 0 at the end. There is actually a third array of auxiliary vectors but after that there is an unspecified amount of bytes before the information block starts. It's generally inside this block the command line arguments and environment variables are stored, as in the actual string values. You can confirm this by dumping the stack of pretty much any program and you typically find all the environment variables at the very end (highest memory address). If you are on Linux you can do this by first reading the '/proc/
/maps' file for any process, just replace with that process' PID. This file contains the ranges of memory mapped to the process and what they are mapped to. Near the bottom you'll see one line with the range mapped to [stack]. Take note of the start address and calculate how big it is in bytes. Then run 'sudo xxd -s -l /dev/ /mem', example 'sudo xxd -s 0x7fff182bd000 -l 0x22000 /dev/14950/mem'. And the environment variables should get printed out together with their hex values and address location.
To illustrate this further I've written a small c program that prints all the environment variables using the argv array pointer. As you can see the environment variable pointers are stored pretty much right after argv.
#include <stdio.h>
int main(int argc, char **argv)
{
for (int i = argc + 2; argv[i] != NULL; i++)
{
printf("%s\n", argv[i]);
}
return 0;
}
You can of course make it less stupid by using the full version of main which includes a pointer to the first element in the environment pointer array.
#include <stdio.h>
int main(int argc, char **argv, char **envp)
{
for (int i = 0; envp[i] != NULL; i++)
{
printf("%s\n", envp[i]);
}
return 0;
}
This is all defined as a part of the ABI (application binary interface) for both the x86 and x86_64 architecture, so 32 and 64 bit desktop computers.
tl;dr: The environment is not a single long string separated by new-line characters. The environment variables and the pointers to them are both stored on the stack or just before it. https://www.youtube.com/watch?v=xHu7qI1gDPA&lc=UgwvbQ7HZFUEZ2EGQ7V4AaABAg
Maybe related: https://serverfault.com/a/792136
Yeah, here i am trying to summarize all this mess.
FC_LANG is used to specify the default language as the weak binding in the query. if this isn't set, the default language will be determined from current locale. https://www.freedesktop.org/software/fontconfig/fontconfig-user.html
export FONTCONFIG_PATH=/etc/fonts
Refs.:
TODOs:
man fonts-conf
ls -al $(nix build --no-link --print-build-logs --print-out-paths github:NixOS/nixpkgs/0938d73bb143f4ae037143572f11f4338c7b2d1c#xorg.fontalias)/share/fonts/X11
echo '\u2603'
LC_CTYPE=C echo '\u2603'
Refs.:
Errors out:
zsh: character not in range
Also,
UTF-8
is not a valid POSIX locale. It may work on some systems, but Arch Linux might not like it.en_US.UTF-8
is valid. Try putting that at the beginning of the line, and usingLC_ALL
instead ofLC_CTYPE
. https://github.com/ohmyzsh/ohmyzsh/issues/4065#issuecomment-129913471We ended up just giving up on trying to fix broken locales, thanks for your contribution and your patience. https://github.com/ohmyzsh/ohmyzsh/pull/4696#issuecomment-537132617
Main ones:
Tables:
Fonts:
More python focused:
Even Julia:
Of course LaTeX:
Linux:
fc-cache -fv
echo "\ue0b0 \ue0a0 \u2b80 \u00b1 \u27a6 \u2718 \u26a1 \u2699"
echo ##1##
echo '\ue0b0 \ue0a0 \u2b80 \u00b1 \u27a6 \u2718 \u26a1 \u2699'
echo ##2##
echo -e '\ue0b0 \ue0a0 \u2b80 \u00b1 \u27a6 \u2718 \u26a1 \u2699'
echo ##3##
echo -e "\ue0b0 \ue0a0 \u2b80 \u00b1 \u27a6 \u2718 \u26a1 \u2699"
fc-list : family
fc-match -s emoji
localectl status
Outputs:
System Locale: LANG=en_US.UTF-8
LC_MONETARY=pt_BR.UTF-8
VC Keymap: us
X11 Layout: br
X11 Model: pc104
X11 Variant: abnt2
X11 Options: terminate:ctrl_alt_bksp
TODO: https://ostechnix.com/install-nerd-fonts-to-add-glyphs-in-your-code-on-linux/ https://ostechnix.com/find-installed-fonts-commandline-linux/ https://github.com/ryanoasis/nerd-fonts/issues/485#issuecomment-1417572779 https://github.com/ryanoasis/nerd-fonts/issues/485#issuecomment-1417328572
https://github.com/NixOS/nixpkgs/issues/86601#issuecomment-686243898
echo -e "\U1f3f4\Ue0067\Ue0062\Ue0077\Ue006c\Ue0073\Ue007f"
echo #####
echo -e "\U1f9df\U200d\U2640\Ufe0f"
EMOJIS=(🥯 🦆 🦉 🥓 🦄 🦀 🖕 🍣 🍤 🍥 🍡 🥃 🥞 🤯 🤪 🤬 🤮 🤫 🤭 🧐 🐕 🦖 👾 🐉 🐓 🐋 🐌 🐢)
echo $EMOJIS
UNICORN='\U1F984'; THUMBS_UP='\U1F44D'; echo -e "Riding an ${UNICORN} (${THUMBS_UP})"
Refs.:
It should output 🍁:
echo -e '\xF0\x9F\x8D\x81'
Refs.:
toon=$'\U1F479'
print -r ${(l:${(m)#toon}:: :)}$'XYZ\n'$toon' ^-- must point to Y'
Refs.:
echo "a\uf240 abc"
Refs.:
text="Éé"; echo ${#text}
LC_CTYPE=C text="Éé"; echo ${#text}
Refs.:
echo \
'\U1F479' \
'\xF0\x9F\x8D\x81' \
'\U1f9df\U200d' \
'\U1F984' \
'\U1F44D' \
'\U1F9DA' \
'\U1F426' \
'\U1F99C' \
'\U1F996' \
'\U1F420' \
'\U1F41E' \
'\U1F340' \
'\U1F308' \
'\U1F965' \
'\U1F37F' \
'\U1F991' \
'\U1F37A' \
'\U1F692' \
'\U1F6F3' \
'\U26A1' \
'\U1F4A7' \
'\U1F537'
Outputs: 👹 🍁 🧟 🦄 👍 🧚 🐦 🦜 🦖 🐠 🐞 🍀 🌈 🥥 🍿 🦑 🍺 🚒 🛳 ⚡ 💧 🔷
Some flags shows:
echo \
'\U1F3F4\UE0067\UE0062\UE0065\UE006E\UE0067\UE007F' \
'\U1F3F4\UE0067\UE0062\UE0073\UE0063\UE0074\UE007F' \
'\U1f3f4\Ue0067\Ue0062\Ue0077\Ue006c\Ue0073\Ue007f'
Outputs: 🏴 🏴 🏴
Some flags are broken:
echo \
'\U1F3F3\UFE0F\U200D\U1F308' \
'\U1F1E7\U1F1F7'
Outputs:
But copying it directly from terminal to here in browser it renders correctly: 🏳️🌈 🇧🇷
The Japan flag, for example:
echo -e '\U1f1ef\U1f1f5' | hexdump -C
Refs.:
TODO: test it
env \
FONTCONFIG_FILE=$PWD/etc-fonts/fonts.conf \
FC_DEBUG=1024 \
pango-view --text="Příliš 😂" --font='"Noto Color Emoji" 20'
Refs.:
xterm -fa 'Dank Mono' -fs 11
Refs.:
TODO: impressive awk
-fu
https://unix.stackexchange.com/a/526681
TODO: it is python code: https://stackoverflow.com/a/37362046
TODO: curl and the '\U0001F514' https://stackoverflow.com/a/55863437
TODO: teste Fira Code + JetBrains Mono
Abstract
There are 4 packages in nixpkgs involved, at least.
Some notes about locale/locale-archive
Old, but really great: https://github.com/NixOS/nix/issues/599#issuecomment-153885553
TODO: take a look in this, lots of troubleshoot commands: https://github.com/NixOS/nix/issues/599
What is the locale-archive
Difference between locale-archive and Machine Object files in /usr/share/locale//LC_MESSAGES/ directory?
TODO: add real updated values and sha256sum
glibc
From: https://gurkan.in/wiki/nix.html#override-example-optional-args
TODO: convert this to a flake nix-shell cannot change locale warning
Some troubleshoot commands:
strace -e file locale
ls /usr/share/i18n/charmaps/
gzip -dk UTF-8.gz
From: Locale issue after upgrading to Ubuntu 16.10 from a clean installation of Ubuntu 16.04Saving all this for now:
TODO: https://unix.stackexchange.com/questions/187402/nix-package-manager-perl-warning-setting-locale-failed/243189#243189
Maybe useful: https://github.com/davidtwco/veritas/blob/6f2c676a76ef2885c9102aeaea874c361dbcaf61/home/profiles/common.nix#L197-L198
TODO: document from where it came from, the python PEPs about it
https://click.palletsprojects.com/en/5.x/python3/#python-3-surrogate-handling