beyondgrep / ack2

**ack 2 is no longer being maintained. ack 3 is the latest version.**
https://github.com/beyondgrep/ack3/
Other
1.48k stars 138 forks source link

Non-english error messages corrupted under perl-5.19.2+ #367

Closed vsespb closed 10 years ago

vsespb commented 11 years ago
bash-4.1$ ./ack-standalone 1 Z
ack-standalone: Z: Permission denied

bash-4.1$ LANG=ru_RU LANGUAGE=ru_RU:ru LC_ALL=ru_RU.utf8  ./ack-standalone 1 Z
Wide character in warn at ./ack-standalone line 2544.
ack-standalone: Z: Отказано в доступе

bash-4.1$ LANG=fr_FR LANGUAGE=fr_FR:ru LC_ALL=fr_FR.utf8  ./ack-standalone 1 Z
ack-standalone: Z: Permission non accord�e

bash-4.1$ ./ack-standalone  --version
ack 2.09_01 (git commit 3e3de19)
Running under Perl 5.19.2 at /home/vse/perl5/perlbrew/perls/perl-5.19.2/bin/perl5.19.2

Copyright 2005-2013 Andy Lester.

This program is free software.  You may modify or distribute it
under the terms of the Artistic License v2.0.

(Russian error message printed with warning, French is corrupted)

this is due to breaking change introduced in perl: https://rt.perl.org/rt3/Ticket/Display.html?id=119499

note: I am not 100% sure if this change will go to stable version or no. note: ack1 affected too

petdance commented 11 years ago

I'm not sure I understand. If this is a Perl problem, what can we do about it?

vsespb commented 11 years ago

a) that's not clear if it will be fixed or no by next stable version. b) even if it fixed, it still can break ack, it depends on fix. c) some programmers tend to workaround problems even in dev versions of perl

it's entirely up to you, if you think it's not issue, np, just close it.

petdance commented 11 years ago

Do you have suggestions as to what we might do?

vsespb commented 11 years ago

I assume your workflow is to never set STDOUT/STDERR encoding.

So something like,

my $err = $!; $err = encode("UTF-8", "$err") if utf8::is_utf8($err); print $err

Would make your code compatible with both old perls and 5.19.2/3 (not sure about future versions) (Yes, I know that is_utf8 is considered danger)

hoelzro commented 11 years ago

The problem with setting the encoding to UTF-8 by default is that we're assuming that the user's locale is UTF-8. If it's ISO-8859-1, KOI-8, etc, we still have the same problem. Also, I'm not sure if utf8::is_utf8 will return true in this case, considering the error message is just a character buffer that Perl got from the C library, and I don't know what assumptions it can make about its encoding.

vsespb commented 11 years ago

@hoelzro

the thing that under perl5.19.2 and perl5.19.3 $! has UTF-8 flag only if locale was UTF-8.

( this is discussed in https://rt.perl.org/rt3/Ticket/Display.html?id=119499 )

so code like

Encode::_utf8_off($err) if  utf8::is_utf8($err)

will work too.

if locale is not UTF-8, perl 5.19.x will behave like any other older version. And in older versions all works fine in all locales, because you don't have binmode ":encoding" for STDOUT/STDERR and just output byte strings to byte-oriented filehandles.

hoelzro commented 11 years ago

@vsespb Ah, interesting. That approach might just work!

petdance commented 11 years ago

I don't want to do anything with this at this point. 5.19.2 is a development version of Perl, and I'm hoping that this bug is just a blip in this verison.

vsespb commented 11 years ago

ok, np

azawawi commented 11 years ago

So how about closing this one since it is not an ack bug?

vsespb commented 10 years ago

so, now in perl-5.21.2+ all error messages will be in English

http://search.cpan.org/~abigail/perl-5.21.2/pod/perl5211delta.pod#%22$!%22_text_is_now_in_English_outside_%22use_locale%22_scope