Closed p5pRT closed 20 years ago
This is a bug report for perl from flan@desktop.com\, generated with the help of perlbug 1.28 running under perl v5.6.0.
----------------------------------------------------------------- Here's what we're trying to do:
'Yo Momma \a@​aa\.aa' =~ m{ ([-\w]+\ ){2} # First and last name\, space-separated \\< # Email delimiter [-\w.%_]+ # Email username \@ # Email separator ([-\w]+\.)+ # Email domain \w\w+ # Email TLD \> # Email delimiter }x;
This should match\, but it doesn't:
perlsh> show scalar 'Yo Momma \a@​aa\.aa' =~ m{ (([-\w]+\ ){2} \\< [-\w.%_]+ \@ ([-\w]+\.)+ \w\w+ \>) }x @data = ( "" );
However\, adding a mystery [^\S\s]* fixes the problem:
perlsh> show scalar 'Yo Momma \a@​aa\.aa' =~ m{ (([-\w]+\ ){2} ([^\S\s]*) \\< [-\w.%_]+ \@ ([-\w]+\.)+ \w\w+ \>) }x @data = ( 1 );
Note that the [^\S\s] can be anything (5\, p\, q\, .\, etc.). Adding one or more characters anywhere in the email address (on the left hand side) fixes it\, too. Removing the first and last name sub expression fixes it. Changing the {2} to a + fixes it. Making it {2\,} fixes it. Making it {2\,3}\, {1\,2}\, etc. fix it.
Hope this sheds some light.
Ian
flan@desktop.com
Ian Flanigan wrote:
This is a bug report for perl from flan@desktop.com\, generated with the help of perlbug 1.28 running under perl v5.6.0.
----------------------------------------------------------------- Here's what we're trying to do:
I will refrain from saying the usual\, "you're can't validate email addresses that way". ;-)
use re 'debug';
'Yo Momma \a@​aa\.aa' =~ m{ ([-\w]+\ ){2} # First and last name\, space-separated \\< # Email delimiter [-\w.%_]+ # Email username \@ # Email separator ([-\w]+\.)+ # Email domain \w\w+ # Email TLD \> # Email delimiter }x;
Compiling REx ` ([-\w]+\ ){2} # First and last name\, space-separated \< # Email delimiter [-\w.%_]+ # Email username \@ # Email separator ([-\w]+\.)+ # Email domain \w\w+ # Email TLD > # Email delimiter ' size 63 first at 6 1: CURLYX {2\,2}(21) 3: OPEN1(5) 5: PLUS(16) 6: ANYOF[\-0-9A-Z_a-z](0) 15: END(0) floating ` \<' at 1..2147483647 (checking floating) stclass `ANYOF[\-0-9A-Z_a-z]' plus minlen 12 Guessing start of match\, REx ` ([-\w]+\ ){2} # First and last name\, space-separated \<...' against `Yo Momma \a@​aa\.aa'... Did not find floating substr ` \<'... Match rejected by optimizer Freeing REx: ` ([-\w]+\ ){2} # First and last name\, space-separated \<...'
I think this bug has been reported before in different forms. It is unfixed AFAIK. But where are all the regexp nodes in the debugging output?
Simplifying:
use re 'debug'; 'Yo' =~ m/[\w]o/;
Compiling REx `[\w]o'
size 13 first at 1
1: ANYOF[0-9A-Z_a-z](11)
10: END(0) \<------ Huh?
anchored `o' at 1 (checking anchored) stclass `ANYOF[0-9A-Z_a-z]' minlen
2
Guessing start of match\, REx `[\w]o' against `Yo'...
Found anchored substr `o' at offset 1...
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx `[\w]o' against `Yo'
Setting an EVAL scope\, savestack=3
0 \<> \
and the equivalent:
use re 'debug'; 'Yo' =~ m/[0-9A-Z_a-z]o/;
Compiling REx `[0-9A-Z_a-z]o'
size 12 first at 1
1: ANYOF[0-9A-Z_a-z](10)
10: EXACT \
Any of the predefined character sets\, \d\, \S\, etc. used in character classes have the same effect on debugging output.
:This is a bug report for perl from flan@desktop.com\, :generated with the help of perlbug 1.28 running under perl v5.6.0. : : :----------------------------------------------------------------- :Here's what we're trying to do: : :'Yo Momma \a@​aa\.aa' =~ m{ : ([-\w]+\ ){2} # First and last name\, space-separated : \\< # Email delimiter : [-\w.%_]+ # Email username : \@ # Email separator : ([-\w]+\.)+ # Email domain : \w\w+ # Email TLD : \> # Email delimiter : }x; : :This should match\, but it doesn't [...]
The patch below fixes this problem as well.
Hugo --- forwarded message "MC" \mc@​backwoords\.org wrote: :The code below matches when run on version 5.005_03 but fails when run on :version 5.06. If you remove the 's' indicated on line 8 it will match :successfully in both versions. : :I can conceive no explination for this and several other from :comp.lang.perl.misc thread 'Mystery Regex' and 'Mystery Regex [long]' agree that :this is indeed a bug.
A couple of shorter testcases:
perl -wle 'print "ok" if "a\,b\,c" =~ /^(?:.\,){2}c/' perl -wle 'print "ok" if "a\,b\,c" =~ /^(?:[^\,]*\,){2}c/'
The regexp engine decided that these regexps were complicated enough to look for fixed substrings (which may in itself be a bug\, not sure)\, and spotted that '\,c' was the longest fixed substring; however\, it failed to take into account the {2} multiplier when determining where in the target string it should expect to find that substring\, so it searched starting from offset 1 rather than offset 3.
Attached patch passes all tests here\, including the new ones.
Hugo
Migrated from rt.perl.org#3421 (status was 'resolved')
Searchable as RT3421$