Perl-Critic / PPI

53 stars 44 forks source link

non-ascii tokens are not recognized #168

Closed karenetheridge closed 8 years ago

karenetheridge commented 9 years ago

When attempting to parse a .pm that contains:

our $π =  atan2(1,1) * 4;

I get:

Fatal error... regex failed to match in 'π = atan2(1,1) * 4;
' when expected at /Volumes/amaretto/Users/ether/.perlbrew/libs/21.10@std/lib/perl5/PPI/Token/Word.pm line 240.

The regex for an acceptable token needs to be updated; unicode in symbols has been supported in core since at least perl 5.16.

(related: https://github.com/adamkennedy/PPI/issues/22, https://github.com/adamkennedy/PPI/issues/26)

drforr commented 9 years ago

This is also a major bug for Perl::ToPerl6.

choroba commented 8 years ago

The same error is triggered by unquoted utf8 hash keys:

use utf8;
my %h = ( é => 'eacute' );
karenetheridge commented 8 years ago

redundant with #22.

karenetheridge commented 8 years ago

The easiest thing to do would be to add an argument to new to pass the encoding parameter in, e.g.:

my $ppi = PPI::Document->new("Foo/Bar.pm", encoding => ':encoding(UTF-8)');

This should work. @rjbs and I are poking through things to confirm that everything works properly if decoded/wide characters are passed as in PPI::Document->new(\$decoded_string_containign_wide_chars).