easybuilders / easybuild-framework

EasyBuild is a software installation framework in Python that allows you to install software in a structured and robust way.
https://easybuild.io
GNU General Public License v2.0
148 stars 202 forks source link

Define a standard set of locale related env. vars; LANG etc #631

Open fgeorgatos opened 11 years ago

fgeorgatos commented 11 years ago

I think the choice is really between LANG= or LANG=C; similarly for LC* investigation to find out exactly what's best may be needed

this got triggered by a discussion on TopHat > Tophat or TopHat < Tophat

boegel commented 11 years ago

It seems like LC_ALL is the only one that really matters:

{{{ LC_ALL This variable determines the values for all locale categories. The value of the LCALL environment variable has precedence over any of the other environment variables starting with LC (LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME) and the LANG environment variable. }}}

(from http://pubs.opengroup.org/onlinepubs/007908799/xbd/envvar.html)

So, setting using

LC_ALL=en_US.UTF-8

seems to make sense?

fgeorgatos commented 11 years ago

I admit having no fixed opinion on this, since every few months I keep discovering bits in in this area; So far, I am a bit more inclined towards LC_ALL=C / LANG=C due to past experience like: http://www.mail-archive.com/lfs-book@linuxfromscratch.org/msg07861.html http://www.mail-archive.com/lfs-book@linuxfromscratch.org/msg08189.html

btw. "C" or "POSIX" are supposed to be equivalent; regular expressions should always behave under that regime.

fgeorgatos commented 11 years ago

OK, especially the second pointer explains to me why I keep using LANG=C in my own shell scripts: ;-) http://superuser.com/questions/334800/lang-c-is-in-a-number-of-the-etc-init-d-scripts-what-does-lang-c-do-and-why http://stackoverflow.com/questions/4493175/bash-sort-unusual-order-problem-with-spaces http://computing.fnal.gov/unix-users/tips/Lang_Tips.html http://www.redhat.com/archives/rhl-beta-list/2004-May/msg02347.html

ie. en_US.UTF-8 involves some kind of character translation, from what pointers above imply...

rjeschmi commented 7 years ago

I think setting C for all sub command calls is probably necessary given how much command output can change based on language. I wonder a bit if errors like Segmentation Fault will be missed by the regexes that check for command error output...

fgeorgatos commented 6 years ago

ping... anyone still on this thread? if so, leaving a drop for posterity...

fotis$ unset `env|grep ^LC_|cut -d= -f1|xargs`; export LANG=C; export LC_ALL=C
fotis$ env|egrep '^(LANG=|LC_)'
LC_ALL=C
LANG=C