DEVSENSE / Phalanger

PHP 5.4 compiler for .NET/Mono frameworks. Predecessor to the opensource PeachPie project (www.peachpie.io).
http://v4.php-compiler.net/
Apache License 2.0
381 stars 92 forks source link

Bug in PerlRegExpConverter #38

Open broudy3 opened 10 years ago

broudy3 commented 10 years ago

I have a problem with following regular expression: \G(((int(eger)?|bool(ean)?|float|double|real|string|binary|array|object))\s*)

This regular expression gives me different result in phalagner, I assume that the problem is in converting pattern to .net. Pattern is converted to: \G(?((int(?eger)?|bool(?ean)?|float|double|real|string|binary|array|object))\s*)

I think that after : \G(?( group name is missing.

The same problem occurs in this regular expression: /((x)y)/ when I match it against 'xy' I get wrong results: preg_match('/((x)y)/', 'xy', $matches, null); $matches[1] == 'x' should be 'xy' $matches[2] == 'xy' should be 'x'

proff commented 10 years ago

not tested well yet...

diff -r cb4f50629489 Phalanger/ClassLibrary/RegExpPerl.cs
--- a/Phalanger/ClassLibrary/RegExpPerl.cs  Thu Sep 11 15:06:26 2014 +0400
+++ b/Phalanger/ClassLibrary/RegExpPerl.cs  Mon Sep 15 23:22:18 2014 +0400
@@ -2265,8 +2265,7 @@
                                             result.Append('>');
                                             continue;
                                         }
-                                        else
-                                        if (i + 2 < perlExpr.Length && perlExpr[i + 2] == ':')
+                                        if (i + 2 < perlExpr.Length && (perlExpr[i + 2] == ':' || perlExpr[i + 2] == '!' || perlExpr[i + 2] == '='))
                                         {
                                             // Pseudo-group, don't count.
                                             --group_number;
@@ -2284,6 +2283,27 @@
                            case 1:
                                 if (ch == '?')
                                     inner_state = 2;
+                                else if (ch == '(')
+                                {
+                                    ++group_number;
+                                    if (i + 1 < perlExpr.Length)
+                                    {
+                                        if (perlExpr[i + 1] != '?')
+                                        {
+                                            ++i;
+                                            result.Append("(?<");
+                                            result.Append(AnonymousGroupPrefix);
+                                            result.Append(group_number);
+                                            result.Append('>');
+                                            continue;
+                                        }
+                                        if (i + 2 < perlExpr.Length && (perlExpr[i + 2] == ':' || perlExpr[i + 2] == '!' || perlExpr[i + 2] == '='))
+                                        {
+                                            // Pseudo-group, don't count.
+                                            --group_number;
+                                        }
+                                    }
+                                }
                                 else if (ch != '(')// stay in inner_state == 1, because this can happen: ((?<blah>...))
                                     inner_state = 0;
                                 break;
broudy3 commented 10 years ago

Sorry my mistake I didn't notice that Github changed the first regular expression, the correct one is : \G(\((int(eger)?|bool(ean)?|float|double|real|string|binary|array|object)\)\s*)

And is converted to: \G(?<an0ny_1>\((int(?<an0ny_2>eger)?|bool(?<an0ny_3>ean)?|float|double|real|string|binary|array|object)\)\s*)

And after \G(?<an0ny_1>\(( and before int group name is missing, so it should be like this: \G(?<an0ny_1>\((?<an0ny_2>int(?<an0ny_3>eger)?|bool(?<an0ny_4>ean)?|float|double|real|string|binary|array|object)\)\s*)

I'm right? Please try to fix also this case, thank you.