Closed michaelrsweet closed 19 years ago
CUPS.org User: mfabian
(Recycling most of a comment by Markus Kuhn from Bug #41006 on http://bugzilla.novell.com):
LC_MESSAGES is not the variable which determines the charmap of the current locale. Instead it is determined from the effective value of LC_CTYPE (I wrote "effective" because LC_CTYPE maybe overridden by LC_ALL or it may be unset and then it inherits the value from LANG).
See the Open Group's Single Unix Specification, which has since 2001 been identical to the IEEE/ISO POSIX standard, available freely on
http://www.opengroup.org/onlinepubs/007904975/
under Base Definitions/Environment Variables you can read:
LC_CTYPE This environment variable determines the interpretation of sequences of bytes of text data as characters (for example, single as opposed to multi-byte characters), the classification of characters (for example, alpha, digit, graph), and the behavior of character classes.
Further down the same page, this environment variable (like all of LC_*) inherits a default value from LANG and can be overridden with LC_ALL. Therefore, to read LC_CTYPE correctly, you need to use something like
if (((s = getenv("LC_ALL")) && s) || ((s = getenv("LC_CTYPE")) && s) || ((s = getenv("LANG")) && *s)) { printf("LC_CTYPE = %s\n", s); }
The "locale" command line tool does that for example.
The proper way to find out the encoding used is to call the function nllanginfo(CODESET), which is also what the command-line "locale charmap" does, because the name of the used character set is actually defined in the locale definitions file that is identified by the LANG or LC* variable.
Until about two years ago, FreeBSD was the last widely used Unix variant that still lacked nl_langinfo(), therefore people had to use workarounds that tried to guess the encoding name from the LC_CTYPE locale name, which is problematic. Two such workaround hacks are linked on
http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate
Fortunately, in 2003 this practice is no longer needed, because nl_langinfo() is now a proper universally implemented POSIX API call.
CUPS.org User: mkuhn
In a nutshell, the portable way any application should determine today on POSIX systems the character set selected by the locale is:
int main() { if (!setlocale(LC_CTYPE, "")) { fprintf(stderr, "Can't set the specified locale! " "Check LANG, LC_CTYPE, LC_ALL.\n"); return 1; } puts(nl_langinfo(CODESET)); return 0;
This is formally guaranteed by the POSIX spec to work on any system where
_POSIX_VERSION >= 200112L
but it will work in practice almost anywhere else, too.
Unfortunately, the output syntax of nl_langinfo(CODESET) is not standardized properly. In practice, UTF-8 is always signalled as "UTF-8", but ISO 8859-15 can come as "ISO8859-15", "ISO_8859-15", "ISO-8859-15", etc.
Therefore, it is a good idea to normalize the output of nl_langinfo(CODESET), and the simple public-domain function
http://www.cl.cam.ac.uk/~mgk25/ucs/norm_charmap.c
can be used to do exactly that.
CUPS.org User: mkuhn
Should there be concern about pre-2001 POSIX systems that do not implement nl_langinfo(CODESET), then a public-domain workaround emulator for it, which guesses the character set based on the locale name from the environment variables, is available on:
http://www.cl.cam.ac.uk/~mgk25/ucs/langinfo.c
That routine was widely used before FreeBSD finally added nl_langinfo(CODESET) support with version 4.6 in mid 2002 (the last widely-used POSIX system that was still missing it). I doubt it is still necessary today.
CUPS.org User: mike
CUPS already uses nl_langinfo(CODESET) when it is available. See the cups/language.c source file.
The current code tests for both nl_langinfo() and a definition of the CODESET constant - if both are not found, the code falls back on environment variables.
Any fix for this will be delayed until 1.2, however if you can look at the current cups/language.c source file and see why it is not working on your OS of choice, we'll be happy to make the necessary changes.
CUPS.org User: mfabian
nl_langinfo(CODESET) is not used because
is missing in cuse-1.1.23/cups/language.c.
Without that,
will of course not include langinfo.h and then CODESET will be undefined.
That's not the only bug though, even with that fix it still doesn't seem to work right.
CUPS.org User: mfabian
etc. didn't work because
#include <locale.h>
was missing.
CUPS.org User: mfabian
I attached a patch "locale.patch" which hopefully fixes the problem.
CUPS.org User: mike
OK, first, we'd need a patch against CUPS 1.2. 1.1.x is closed for all but security bugs.
As for
Please look at the current 1.2 sources; here is a direct link:
http://svn.easysw.com/public/cups/trunk/cups/language.c
The current code seems to work the "right" way using nl_langinfo() when available...
CUPS.org User: mike
This STR has not been updated by the submitter for two or more weeks and has been closed as required by the CUPS Configuration Management Plan. If the issue still requires resolution, please re-submit a new STR.
"locale.patch":
diff -ru cups-1.1.23.orig/cups/language.c cups-1.1.23/cups/language.c --- cups-1.1.23.orig/cups/language.c 2005-01-03 20:29:45.000000000 +0100 +++ cups-1.1.23/cups/language.c 2005-06-15 17:56:09.000000000 +0200 @@ -40,9 +40,11 @@
+#include
+#include
@@ -114,6 +116,116 @@ };
+#ifndef HAVE_LANGINFO_H +/*
-# ifdef LC_MESSAGES
ptr = setlocale(LC_MESSAGES, "");
if (ptr) { @@ -309,7 +400,6 @@
charset[0] = '\0';
-#ifdef CODESET /*
this value as the character set... @@ -330,7 +420,6 @@ DEBUG_printf(("cupsLangGet: charset set to \"%s\" via nl_langinfo(CODESET)...\n", charset)); } -#endif /* CODESET */
/*
/*
* Force a POSIX locale for an invalid language name...
*/
@@ -410,7 +486,6 @@ { strcpy(langname, "C"); country[0] = '\0';
diff -ru cups-1.1.23.orig/scheduler/type.c cups-1.1.23/scheduler/type.c --- cups-1.1.23.orig/scheduler/type.c 2005-01-03 20:29:59.000000000 +0100 +++ cups-1.1.23/scheduler/type.c 2005-06-15 14:31:05.000000000 +0200 @@ -942,6 +942,8 @@ case MIME_MAGIC_LOCALE :
result = (strcmp(rules->value.localev, setlocale(LC_ALL, "")) == 0);
+#elif defined(GLIBC) && defined(LC_CTYPE)
result = (strcmp(rules->value.localev, setlocale(LC_MESSAGES, "")) == 0);
Version: 1.1.23 CUPS.org User: mfabian
To show that CUPS behaves as stated in the summary of the bug, I use the following locale setting for testing:
Now, when setting LC_CTYPE to an UTF-8 locale and printing a UTF-8 encoded test file:
it is not printed correctly.
But when LC_MESSAGES is set instead of LC_CTYPE:
the file is printed correctly.
The test file looks like this:
I'll attach it as well.