fglock / Perlito

"Perlito" Perl programming language compiler
http://fglock.github.io/Perlito/
Other
414 stars 47 forks source link

Perlito java regexps generated at runtime need to be double escaped #41

Closed potyl closed 8 years ago

potyl commented 8 years ago

This is a case where Perl code and generated Java code differ in implementation. Basically if we generate in Perl a regexp in real time (as a string) and use it to evaluate a pattern the regexp will fail if we have escapes. What happens is that in Perl \\s has to be used but in Java \\\\s need to be used.

Example:

#!/usr/bin/env perl
use strict;
use warnings;

my $input = 'a b c';
my $regexp_string = qq{a<SPACE>b\\sc};

my $regexp_clean = $regexp_string;
$regexp_clean =~ s/<SPACE>/\\s/;

print "Trying '$input' =~ /$regexp_clean/\n";
$input =~ /$regexp_clean/ or die "Failed regexp clean: '$input' =~ /$regexp_clean/";
print "Ok '$input' =~ /$regexp_clean/\n";

Perl output:

Trying 'a b c' =~ /a\sb\sc/
Ok 'a b c' =~ /a\sb\sc/

Java output:

Trying 'a b c' =~ /asb\sc/
Exception in thread "main" PlDieException: Failed regexp clean: 'a b c' =~ /asb\sc/
    at PlCORE.die(Main.java:114)
    at Main.main(Main.java:3514)

If we modify the original program to this:

#!/usr/bin/env perl
use strict;
use warnings;

my $input = 'a b c';
my $regexp_string = qq{a<SPACE>b\\sc};

my $regexp_clean = $regexp_string;
$regexp_clean =~ s/<SPACE>/\\\\s/;

print "Trying '$input' =~ /$regexp_clean/\n";
$input =~ /$regexp_clean/ or die "Failed regexp clean: '$input' =~ /$regexp_clean/";
print "Ok '$input' =~ /$regexp_clean/\n";

Then we get a failure in Perl:

Trying 'a b c' =~ /a\\sb\sc/
Failed regexp clean: 'a b c' =~ /a\\sb\sc/ at ./sample.pl line 12.

But a success in Java:

Trying 'a b c' =~ /a\sb\sc/
Ok 'a b c' =~ /a\sb\sc/