regexec.c stack overflow

p5pRT commented 8 years ago

Migrated from rt.perl.org#129903 (status was 'open')

Searchable as RT129903$

p5pRT commented 8 years ago

From @geeknik

I'm not convinced that this is an actual bug\, but #p5p was silent when I asked about it. Affects Perl back to 5.20.2 including v5.25.6 (v5.25.5-114-g87af8d5). Valgrind fails pretty quickly but gdb just goes on forever.

perl -e '/(?{m}(0)}\,s\/\/\/})//0'

==6615==ERROR: AddressSanitizer: stack-overflow on address 0x7ffc78f88ca8 (pc 0x0000004c0d4c bp 0x7ffc78f89500 sp 0x7ffc78f88cb0 T0) #0 0x4c0d4b in calloc (/root/perl/perl+0x4c0d4b) #1 0x7f9301 in Perl_safesyscalloc /root/perl/util.c:442:18 #2 0xb45aed in Perl_regexec_flags /root/perl/regexec.c:3128:9 #3 0x8cf9c5 in Perl_pp_subst /root/perl/pp_hot.c:2981:10 #4 0x7f4483 in Perl_runops_debug /root/perl/dump.c:2246:23 #5 0xb984b6 in S_regmatch /root/perl/regexec.c:6888:3 #6 0xb7337c in S_regtry /root/perl/regexec.c:3622:14 ... ... #247 0xb4b82b in Perl_regexec_flags /root/perl/regexec.c:3489:7 #248 0x8cf9c5 in Perl_pp_subst /root/perl/pp_hot.c:2981:10 #249 0x7f4483 in Perl_runops_debug /root/perl/dump.c:2246:23 #250 0xb984b6 in S_regmatch /root/perl/regexec.c:6888:3 #251 0xb7337c in S_regtry /root/perl/regexec.c:3622:14

SUMMARY: AddressSanitizer: stack-overflow ??:0 calloc ==6615==ABORTING

==19424== Stack overflow in thread 1: can't grow stack to 0xffe801f90 ==19424== ==19424== Process terminating with default action of signal 11 (SIGSEGV) ==19424== Access not within mapped region at address 0xFFE801F90 ==19424== at 0x5B4137: S_regtry (regexec.c:3578) ==19424== If you believe this happened as a result of a stack ==19424== overflow in your program's main thread (unlikely but ==19424== possible)\, you can try to increase the size of the ==19424== main thread stack using the --main-stacksize= flag. ==19424== The main thread stack size used in this run was 8388608. ==19424== Stack overflow in thread 1: can't grow stack to 0xffe801f88 ==19424== ==19424== Process terminating with default action of signal 11 (SIGSEGV) ==19424== Access not within mapped region at address 0xFFE801F88 ==19424== at 0x4A236C0: _vgnU_freeres (vg_preloaded.c:58) ==19424== If you believe this happened as a result of a stack ==19424== overflow in your program's main thread (unlikely but ==19424== possible)\, you can try to increase the size of the ==19424== main thread stack using the --main-stacksize= flag. ==19424== The main thread stack size used in this run was 8388608. Segmentation fault

p5pRT commented 8 years ago

From @cpansprout

On Mon Oct 17 10:26:19 2016\, brian.carpenter@gmail.com wrote:

I'm not convinced that this is an actual bug\,

I think it is.

but #p5p was silent when I asked about it. Affects Perl back to 5.20.2 including v5.25.6 (v5.25.5-114-g87af8d5). Valgrind fails pretty quickly but gdb just goes on forever.

perl -e '/(?{m}(0)}\,s\/\/\/})//0'

That is nonsensical code. $ perl5.18.3 -e '/(?{m}(0)}\,s\/\/\/})//0' Segmentation fault: 11 $ perl5.14.4 -e '/(?{m}(0)}\,s\/\/\/})//0' Sequence (?{...}) not terminated or not {}-balanced in regex; marked by \<-- HERE in m/(?{ \<-- HERE m}(0)}\,s///})/ at -e line 1.

I do not have 5.16 handy. The output from 5.14 is what I would expect.

==6615==ERROR: AddressSanitizer: stack-overflow on address 0x7ffc78f88ca8 (pc 0x0000004c0d4c bp 0x7ffc78f89500 sp 0x7ffc78f88cb0 T0) #0 0x4c0d4b in calloc (/root/perl/perl+0x4c0d4b) #1 0x7f9301 in Perl_safesyscalloc /root/perl/util.c:442:18 #2 0xb45aed in Perl_regexec_flags /root/perl/regexec.c:3128:9 #3 0x8cf9c5 in Perl_pp_subst /root/perl/pp_hot.c:2981:10 #4 0x7f4483 in Perl_runops_debug /root/perl/dump.c:2246:23 #5 0xb984b6 in S_regmatch /root/perl/regexec.c:6888:3 #6 0xb7337c in S_regtry /root/perl/regexec.c:3622:14

It should not even be getting that far. It should fail at compile time.

--

Father Chrysostomos

p5pRT commented 8 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 8 years ago

From @dcollinsn

$ perl5.14.0 -e '/(?{m}(0)}\,s\/\/\/})//0' Sequence (?{...}) not terminated or not {}-balanced in regex; marked by \<-- HERE in m/(?{ \<-- HERE m}(0)}\,s///})/ at -e line 1. $ perl5.18.0 -e '/(?{m}(0)}\,s\/\/\/})//0' Segmentation fault $ perl5.16.0 -e '/(?{m}(0)}\,s\/\/\/})//0' Sequence (?{...}) not terminated or not {}-balanced in regex; marked by \<-- HERE in m/(?{ \<-- HERE m}(0)}\,s///})/ at -e line 1. $ perl5.17.0 -e '/(?{m}(0)}\,s\/\/\/})//0' Sequence (?{...}) not terminated or not {}-balanced in regex; marked by \<-- HERE in m/(?{ \<-- HERE m}(0)}\,s///})/ at -e line 1. $ perl5.17.5 -e '/(?{m}(0)}\,s\/\/\/})//0' Segmentation fault $ perl5.17.3 -e '/(?{m}(0)}\,s\/\/\/})//0' Segmentation fault $ perl5.17.1 -e '/(?{m}(0)}\,s\/\/\/})//0' Segmentation fault $ perl Porting/bisect.pl --start=v5.17.0 --end=v5.17.1 --crash -- ./perl -Ilib -e '/(?{m}(0)}\,s\/\/\/})//0' 68e2671bec1b01022978d5d5eb6eee8742396e13 is the first bad commit commit 68e2671bec1b01022978d5d5eb6eee8742396e13 Author: David Mitchell \davem@iabyn\.com Date: Thu Aug 25 11:41:49 2011 +0100

Mostly complete fix for literal /(?{..})/ blocks

Change the way that code blocks in patterns are parsed and executed\, especially as regards lexical and scoping behaviour.

(Note that this fix only applies to literal code blocks appearing within patterns: run-time patterns\, and literals within qr//\, are still done the old broken way for now).

This change means that for literal /(?{..})/ and /(??{..})/:

* the code block is now fully parsed in the same pass as the surrounding code\, which means that the compiler no longer just does a simplistic count of balancing {} to find the limits of the code block; i.e. stuff like /(?{ $x = "{" })/ now works (in the same way that subscripts in double quoted strings always have: "$a{'{'}" )

* Error and warning messages will now appear to emanate from the main body rather than an re_eval; e.g. the output from

#!/usr/bin/perl /(?{ warn "boo" })/

has changed from

boo at (re_eval 1) line 1.

to

boo at /tmp/p line 2.

* scope and closures now behave as you might expect; for example

for my $x (qw(a b c)) { "" =~ /(?{ print $x })/ }

now prints "abc" rather than ""

* with recursion\, it now finds the lexical within the appropriate depth of pad: this code now prints "012" rather than "000":

sub recurse { my ($n) = @_; return if $n > 2; "" =~ /^(?{print $n})/; recurse($n+1); } recurse(0);

* an earlier fix that stopped 'my' declarations within code blocks causing crashes\, required the accumulating of two SAVECOMPPADs on the stack for each iteration of the code block; this is no longer needed;

* UNITCHECK blocks within literal code blocks are now run as part of the main body of code (run-time code blocks still trigger an immediate call to the UNITCHECK block though)

This is all achieved by building upon the efforts of the commits which led up to this; those altered the parser to parse literal code blocks directly\, but up until now those code blocks were discarded by Perl_pmruntime and the block re-compiled using the original re_eval mechanism. As of this commit\, for the non-qr and non-runtime variants\, those code blocks are no longer thrown away. Instead:

* the LISTOP generated by the parser\, which contains all the code blocks plus OP_CONSTs that collectively make up the literal pattern\, is now stored in a new field in PMOPs\, called op_code_list. For example in /A(?{BLOCK})C/\, the listop stored in op_code_list looks like

LIST PUSHMARK CONST['A'] NULL/special (aka a DO block) BLOCK CONST['(?{BLOCK})'] CONST['B']

* each of the code blocks has its last op set to null and is individually run through the peephole optimiser\, so each one becomes a little self-contained block of code\, rather than a list of blocks that run into each other;

* then in re_op_compile()\, we concatenate the list of CONSTs to produce a string to be compiled\, but at the same time we note any DO blocks and note the start and end positions of the corresponding CONST['(?{BLOCK})'];

* (if the current regex engine isn't the built-in perl one\, then we just throw away the code blocks and pass the concatenated string to the engine)

* then during regex compilation\, whenever we encounter a '(?{'\, we see if it matches the index of one of the pre-compiled blocks\, and if so\, we store a pointer to that block in an 'l' data slot\, and use the end index to skip over the text of the code body. Conversely\, if the index doesn't match\, then we know that it's a run-time pattern and (for now)\, compile it in the old way.

* During execution\, when an EVAL op is encountered\, if data->what is 'l'\, then we just use the pad that was in effect when the pattern was called; i.e. we use the current pad slot of the currently executing CV that the pattern is embedded within.

:100644 100644 75ee7e7bae9366e089980b8a7d346198645cdfe9 8f2fa766c6cc393d413d59b55ad6f896ec5c7010 M dump.c :100644 100644 5cc98874ae03eae36d4207071ed587d51e80ebed 4d82c7cc5797c4a97f2251f4f1487aa090efda99 M op.c :100644 100644 6aa16f5725db2ae0a6a01b0d5b34b6446e8b5ebf f267da2c9c752b765151eca4c7009230c97b82ed M op.h :040000 040000 3454e32e87872db5fef604d816c7ca37dc487166 f7b4543ded1cbaf299521b31150d1c38e5b11509 M pod :100644 100644 394502b314a65eea36bca2f52e31f50048f0060d b45122c1a79df44e6094c1001fc7b3456543b740 M regcomp.c :100644 100644 0fdb0058b8b5fce8763f7d0da05b187d460838b6 a9da0c97e12a8038ef0bc49870737fe62a7c01e6 M regcomp.h :100644 100644 bb845a79216c447b2fa7ec08cee46602b05d3461 f384c4d32c11fe1917e7e8fbcd998b60dc1355c3 M regexec.c :040000 040000 a2c61b299bd88c2308af219684388ae3deea371d 4ae03f25b1ebcee1518e0dc6ccc7ae23b1433f61 M t bisect run success That took 772 seconds.

-- Dan Collins

p5pRT commented 8 years ago

From @demerphq

On 18 October 2016 at 03:17\, Father Chrysostomos via RT \perlbug\-followup@perl\.org wrote:

On Mon Oct 17 10:26:19 2016\, brian.carpenter@gmail.com wrote:

I'm not convinced that this is an actual bug\,

I think it is.

but #p5p was silent when I asked about it. Affects Perl back to 5.20.2 including v5.25.6 (v5.25.5-114-g87af8d5). Valgrind fails pretty quickly but gdb just goes on forever.

perl -e '/0'

That is nonsensical code.

Do you mean just the '/0' or the full pattern?

With some editing I read the original code as being the same as

(m/(?{ m!(0)!\, s!!! })/) / 0

Which is strange but IMO valid code.

$ perl5.18.3 -e '/(?{m}(0)}\,s\/\/\/})//0' Segmentation fault: 11 $ perl5.14.4 -e '/(?{m}(0)}\,s\/\/\/})//0' Sequence (?{...}) not terminated or not {}-balanced in regex; marked by \<-- HERE in m/(?{ \<-- HERE m}(0)}\,s///})/ at -e line 1.

I do not have 5.16 handy. The output from 5.14 is what I would expect.

I am not sure why. Is it because you expect the first } to close the (?{ \, while I dont? The inside of the (?{ ... }) is code\, and should follow nesting rules\, so m}(0)} should be parsed and extracted as a balanced construct\, just as it would if it were q}foo}\, and then the s/// gets parsed out\, to find the trailing }) which finish the (?{}) construct.

I would say the 5.14.4 behavior is a misparse.

If this is legal:

perl -le'print q}foo}' foo

then I would expect this:

m}(0)}

to be legal too\, and i would expect it to be legal inside of a (?{ ... }) construct in a pattern.

==6615==ERROR: AddressSanitizer: stack-overflow on address 0x7ffc78f88ca8 (pc 0x0000004c0d4c bp 0x7ffc78f89500 sp 0x7ffc78f88cb0 T0) #0 0x4c0d4b in calloc (/root/perl/perl+0x4c0d4b) #1 0x7f9301 in Perl_safesyscalloc /root/perl/util.c:442:18 #2 0xb45aed in Perl_regexec_flags /root/perl/regexec.c:3128:9 #3 0x8cf9c5 in Perl_pp_subst /root/perl/pp_hot.c:2981:10 #4 0x7f4483 in Perl_runops_debug /root/perl/dump.c:2246:23 #5 0xb984b6 in S_regmatch /root/perl/regexec.c:6888:3 #6 0xb7337c in S_regtry /root/perl/regexec.c:3622:14

It should not even be getting that far. It should fail at compile time.

Looking forward to hearing more on why.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 8 years ago

From @demerphq

On 18 October 2016 at 10:00\, demerphq \demerphq@gmail\.com wrote:

On 18 October 2016 at 03:17\, Father Chrysostomos via RT \perlbug\-followup@perl\.org wrote:

On Mon Oct 17 10:26:19 2016\, brian.carpenter@gmail.com wrote:

I'm not convinced that this is an actual bug\,

I think it is.

but #p5p was silent when I asked about it. Affects Perl back to 5.20.2 including v5.25.6 (v5.25.5-114-g87af8d5). Valgrind fails pretty quickly but gdb just goes on forever.

perl -e '/0'

That is nonsensical code.

Do you mean just the '/0' or the full pattern?

With some editing I read the original code as being the same as

(m/(?{ m!(0)!\, s!!! })/) / 0

Which is strange but IMO valid code.

$ perl5.18.3 -e '/(?{m}(0)}\,s\/\/\/})//0' Segmentation fault: 11 $ perl5.14.4 -e '/(?{m}(0)}\,s\/\/\/})//0' Sequence (?{...}) not terminated or not {}-balanced in regex; marked by \<-- HERE in m/(?{ \<-- HERE m}(0)}\,s///})/ at -e line 1.

I do not have 5.16 handy. The output from 5.14 is what I would expect.

I am not sure why. Is it because you expect the first } to close the (?{ \, while I dont? The inside of the (?{ ... }) is code\, and should follow nesting rules\, so m}(0)} should be parsed and extracted as a balanced construct\, just as it would if it were q}foo}\, and then the s/// gets parsed out\, to find the trailing }) which finish the (?{}) construct.

I would say the 5.14.4 behavior is a misparse.

If this is legal:

perl -le'print q}foo}' foo

then I would expect this:

m}(0)}

to be legal too\, and i would expect it to be legal inside of a (?{ ... }) construct in a pattern.

==6615==ERROR: AddressSanitizer: stack-overflow on address 0x7ffc78f88ca8 (pc 0x0000004c0d4c bp 0x7ffc78f89500 sp 0x7ffc78f88cb0 T0) #0 0x4c0d4b in calloc (/root/perl/perl+0x4c0d4b) #1 0x7f9301 in Perl_safesyscalloc /root/perl/util.c:442:18 #2 0xb45aed in Perl_regexec_flags /root/perl/regexec.c:3128:9 #3 0x8cf9c5 in Perl_pp_subst /root/perl/pp_hot.c:2981:10 #4 0x7f4483 in Perl_runops_debug /root/perl/dump.c:2246:23 #5 0xb984b6 in S_regmatch /root/perl/regexec.c:6888:3 #6 0xb7337c in S_regtry /root/perl/regexec.c:3622:14

It should not even be getting that far. It should fail at compile time.

Looking forward to hearing more on why.

I dont think the simplified version of the code should infinite loop on the stack.

For this code (because $_ is the empty string) we should perform an outer match once\, then a match using qr/(0)/ twice\, once via the "empty pattern special case" in the s///\, both of which should fail\, and then the outer match should stop. We should not loop. Apparently the problem is related to the empty-pattern special case behavior. If I change the pattern from

m/(?{ m!(0)!\, s!!! })/

to

m/(?{ m!(0)!\, s!(0)!! })/

which should be equivalent valgrind is happy and we get what I expect\, a division by zero error:

$ ./perl -le'(m/(?{ m!(0)!\, s!(0)!! })/) / 0' Illegal division by zero at -e line 1.

$ ./perl -le'(m/(?{ m!(0)!\, s!!! })/) / 0' Segmentation fault

And here is the valgrind output:

$ valgrind ./perl -le'(m/(?{ m!(0)!\, s!!! })/) / 0' ==25240== Memcheck\, a memory error detector ==25240== Copyright (C) 2002-2013\, and GNU GPL'd\, by Julian Seward et al. ==25240== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info ==25240== Command: ./perl -le(m/(?{\ m!(0)!\,\ s!!!\ })/)\ /\ 0 ==25240== ==25240== Stack overflow in thread 1: can't grow stack to 0xffe801f88 ==25240== ==25240== Process terminating with default action of signal 11 (SIGSEGV) ==25240== Access not within mapped region at address 0xFFE801F88 ==25240== at 0x5D01B0: Perl_sv_2pv_flags (sv.c:2939) ==25240== If you believe this happened as a result of a stack ==25240== overflow in your program's main thread (unlikely but ==25240== possible)\, you can try to increase the size of the ==25240== main thread stack using the --main-stacksize= flag. ==25240== The main thread stack size used in this run was 8388608. ==25240== Stack overflow in thread 1: can't grow stack to 0xffe801f68 ==25240== ==25240== Process terminating with default action of signal 11 (SIGSEGV) ==25240== Access not within mapped region at address 0xFFE801F68 ==25240== at 0x4A256B0: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so) ==25240== If you believe this happened as a result of a stack ==25240== overflow in your program's main thread (unlikely but ==25240== possible)\, you can try to increase the size of the ==25240== main thread stack using the --main-stacksize= flag. ==25240== The main thread stack size used in this run was 8388608. ==25240== ==25240== HEAP SUMMARY: ==25240== in use at exit: 7\,449\,285 bytes in 11\,675 blocks ==25240== total heap usage: 11\,825 allocs\, 150 frees\, 8\,150\,685 bytes allocated ==25240== ==25240== LEAK SUMMARY: ==25240== definitely lost: 0 bytes in 0 blocks ==25240== indirectly lost: 0 bytes in 0 blocks ==25240== possibly lost: 0 bytes in 0 blocks ==25240== still reachable: 7\,449\,285 bytes in 11\,675 blocks ==25240== suppressed: 0 bytes in 0 blocks ==25240== Rerun with --leak-check=full to see details of leaked memory ==25240== ==25240== For counts of detected and suppressed errors\, rerun with: -v ==25240== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Segmentation fault

This is definitely a bug.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 8 years ago

From @demerphq

On 18 October 2016 at 10:17\, demerphq \demerphq@gmail\.com wrote:

... Apparently the problem is related to the empty-pattern special case behavior. [...]

This is definitely a bug.

And yet another example of the empty pattern special case causing problems. Sigh\, so much trouble for so little value.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 8 years ago

From @demerphq

On 18 October 2016 at 10:21\, demerphq \demerphq@gmail\.com wrote:

On 18 October 2016 at 10:17\, demerphq \demerphq@gmail\.com wrote:

... Apparently the problem is related to the empty-pattern special case behavior. [...]

This is definitely a bug.

And yet another example of the empty pattern special case causing problems. Sigh\, so much trouble for so little value.

Actually this case can be boiled down to:

./perl -le'/(?{ s!!! })/'

which causes an infinite loop due to PL_curpm being set at the start of the match\, which the s!!! then sees as the "last successful match". IMO it shouldn't as the outer match hasn't actually finished yet\, but because of when and how we set up PL_curpm at match start we have this problem inside of (?{ }) constructs.

I think we can fix it by checking PL_curpm against PL_reg_curpm when we do the empty pattern check; running tests now.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 8 years ago

From @hvds

On Tue Oct 18 02:30:01 2016\, demerphq wrote:

I think we can fix it by checking PL_curpm against PL_reg_curpm when we do the empty pattern check; running tests now.

And do what? Even if they're the same\, it still might also legitimately be the last successful match\, no?

It seems to me we can't offer the promised behaviour unless there's a variable that really is set only at the point we complete a successful match.

Hugo

p5pRT commented 8 years ago

From @demerphq

On 18 October 2016 at 12:03\, Hugo van der Sanden via RT \perlbug\-followup@perl\.org wrote:

On Tue Oct 18 02:30:01 2016\, demerphq wrote:

I think we can fix it by checking PL_curpm against PL_reg_curpm when we do the empty pattern check; running tests now.

And do what? Even if they're the same\, it still might also legitimately be the last successful match\, no?

No I don't think so\, and all the tests pass.

As far as I can tell PL_reg_curpm is a special opcode which we use when we are setting up PL_curpm before executing a (?{...}) or (??{ ... }) code block so that things like $1 and stuff like that point at the current pattern. Once the code is executed PL_curpm is unwound to no longer point at PL_reg_curpm. Essentially the problem comes down to PL_curpm being the place where things like $1 and friends are derived\, as well as how we find the pattern for the last successful match\, but inside of a (?{ ... }) we want $1 to point at the *current* pattern\, not the *last* successfully matched one. This conflict currently causes use of the empty pattern in a code block to point at the pattern that contains the code\, and trigger infinite recursion. So to fix this I think we would have to split PL_curpm in two\, and have one var for $1 and friends representing "the last successful matched pattern or when in a regex code construct the currently executing pattern"\, and another for "last successful matched pattern" which would be use by the empty pattern magic.

On the other hand we can simply forbid the use of the empty pattern inside of a regex code block and have a much simpler patch. :-)

Yves

p5pRT commented 8 years ago

From @cpansprout

On Tue Oct 18 01:01:21 2016\, demerphq wrote:

On 18 October 2016 at 03:17\, Father Chrysostomos via RT \perlbug\-followup@perl\.org wrote:

On Mon Oct 17 10:26:19 2016\, brian.carpenter@gmail.com wrote:

I'm not convinced that this is an actual bug\,

I think it is.

but #p5p was silent when I asked about it. Affects Perl back to 5.20.2 including v5.25.6 (v5.25.5-114-g87af8d5). Valgrind fails pretty quickly but gdb just goes on forever.

perl -e '/0'

That is nonsensical code.

Do you mean just the '/0' or the full pattern?

I do not know what happened\, but the line that I quoted was:

perl -e '/(?{m}(0)}\,s\/\/\/})//0'

not:

perl -e '/0'

With some editing I read the original code as being the same as

(m/(?{ m!(0)!\, s!!! })/) / 0

I see now that I misread the code. the } for the m}} delimiter was what threw me off.

--

Father Chrysostomos

p5pRT commented 8 years ago

From @cpansprout

On Tue Oct 18 04:08:30 2016\, demerphq wrote:

On 18 October 2016 at 12:03\, Hugo van der Sanden via RT \perlbug\-followup@perl\.org wrote:

On Tue Oct 18 02:30:01 2016\, demerphq wrote:

I think we can fix it by checking PL_curpm against PL_reg_curpm when we do the empty pattern check; running tests now.

And do what? Even if they're the same\, it still might also legitimately be the last successful match\, no?

No I don't think so\, and all the tests pass.

As far as I can tell PL_reg_curpm is a special opcode which we use when we are setting up PL_curpm before executing a (?{...}) or (??{ ... }) code block so that things like $1 and stuff like that point at the current pattern. Once the code is executed PL_curpm is unwound to no longer point at PL_reg_curpm. Essentially the problem comes down to PL_curpm being the place where things like $1 and friends are derived\, as well as how we find the pattern for the last successful match\, but inside of a (?{ ... }) we want $1 to point at the *current* pattern\, not the *last* successfully matched one. This conflict currently causes use of the empty pattern in a code block to point at the pattern that contains the code\, and trigger infinite recursion. So to fix this I think we would have to split PL_curpm in two\, and have one var for $1 and friends representing "the last successful matched pattern or when in a regex code construct the currently executing pattern"\, and another for "last successful matched pattern" which would be use by the empty pattern magic.

On the other hand we can simply forbid the use of the empty pattern inside of a regex code block and have a much simpler patch. :-)

PL_curpm does far too many things. This is not the only bug that results.

--

Father Chrysostomos

p5pRT commented 8 years ago

From @demerphq

On 18 Oct 2016 23:46\, "Father Chrysostomos via RT" \< perlbug-followup@perl.org> wrote:

On Tue Oct 18 04:08:30 2016\, demerphq wrote:

On 18 October 2016 at 12:03\, Hugo van der Sanden via RT \perlbug\-followup@perl\.org wrote:

On Tue Oct 18 02:30:01 2016\, demerphq wrote:

I think we can fix it by checking PL_curpm against PL_reg_curpm when we do the empty pattern check; running tests now.

And do what? Even if they're the same\, it still might also legitimately be the last successful match\, no?

No I don't think so\, and all the tests pass.

As far as I can tell PL_reg_curpm is a special opcode which we use when we are setting up PL_curpm before executing a (?{...}) or (??{ ... }) code block so that things like $1 and stuff like that point at the current pattern. Once the code is executed PL_curpm is unwound to no longer point at PL_reg_curpm. Essentially the problem comes down to PL_curpm being the place where things like $1 and friends are derived\, as well as how we find the pattern for the last successful match\, but inside of a (?{ ... }) we want $1 to point at the *current* pattern\, not the *last* successfully matched one. This conflict currently causes use of the empty pattern in a code block to point at the pattern that contains the code\, and trigger infinite recursion. So to fix this I think we would have to split PL_curpm in two\, and have one var for $1 and friends representing "the last successful matched pattern or when in a regex code construct the currently executing pattern"\, and another for "last successful matched pattern" which would be use by the empty pattern magic.

On the other hand we can simply forbid the use of the empty pattern inside of a regex code block and have a much simpler patch. :-)

PL_curpm does far too many things. This is not the only bug that results.

Can you expand on that thought a bit please?

Yves

p5pRT commented 8 years ago

From @demerphq

On 19 October 2016 at 09:22\, demerphq \demerphq@gmail\.com wrote:

On 18 Oct 2016 23:46\, "Father Chrysostomos via RT" \perlbug\-followup@perl\.org wrote:

On Tue Oct 18 04:08:30 2016\, demerphq wrote:

On 18 October 2016 at 12:03\, Hugo van der Sanden via RT \perlbug\-followup@perl\.org wrote:

On Tue Oct 18 02:30:01 2016\, demerphq wrote:

I think we can fix it by checking PL_curpm against PL_reg_curpm when we do the empty pattern check; running tests now.

And do what? Even if they're the same\, it still might also legitimately be the last successful match\, no?

No I don't think so\, and all the tests pass.

As far as I can tell PL_reg_curpm is a special opcode which we use when we are setting up PL_curpm before executing a (?{...}) or (??{ ... }) code block so that things like $1 and stuff like that point at the current pattern. Once the code is executed PL_curpm is unwound to no longer point at PL_reg_curpm. Essentially the problem comes down to PL_curpm being the place where things like $1 and friends are derived\, as well as how we find the pattern for the last successful match\, but inside of a (?{ ... }) we want $1 to point at the *current* pattern\, not the *last* successfully matched one. This conflict currently causes use of the empty pattern in a code block to point at the pattern that contains the code\, and trigger infinite recursion. So to fix this I think we would have to split PL_curpm in two\, and have one var for $1 and friends representing "the last successful matched pattern or when in a regex code construct the currently executing pattern"\, and another for "last successful matched pattern" which would be use by the empty pattern magic.

On the other hand we can simply forbid the use of the empty pattern inside of a regex code block and have a much simpler patch. :-)

PL_curpm does far too many things. This is not the only bug that results.

Can you expand on that thought a bit please?

BTW\, I pushed a patch to forbid use of the empty pattern inside of regex code blocks.

Once I understand what else you worry about with PL_curpm I will do a patch to make it work properly.

But I figure a run time catchable fatal exception is better than a C stack overflow until we can make it work properly.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 8 years ago

From @cpansprout

On Wed Oct 19 00:23:19 2016\, demerphq wrote:

On 18 Oct 2016 23:46\, "Father Chrysostomos via RT" \< perlbug-followup@perl.org> wrote:

PL_curpm does far too many things. This is not the only bug that results.

Can you expand on that thought a bit please?

Having PL_curpm point to the actual pattern in the op tree results in different recursion levels sharing the same set of match variables:

$u = "\,echle etn sJ"; $t = "\nrka rPrhoatu"; $_ = $u.$t; sub foo { s/(.)//s or return; bar(); print chop $$1 } sub bar { s/(.)//s or return; foo(); print chop $$1 } foo

(I thought I had reported this before\, but I cannot find the ticket now.)

Using a qr// works.

Similarly\, having m|(?{ // })| try to use the *current* pattern instead of the last pattern to match successfully—solely due to an implementation detail—seems wrong.

PL_curpm is serving three purposes: the last match op\, the last set of matched buffers ($1 etc.)\, and the innermost pattern enclosing the currently executing code.

--

Father Chrysostomos

p5pRT commented 8 years ago

From @demerphq

On 19 October 2016 at 22:51\, Father Chrysostomos via RT \perlbug\-followup@perl\.org wrote:

On Wed Oct 19 00:23:19 2016\, demerphq wrote:

On 18 Oct 2016 23:46\, "Father Chrysostomos via RT" \< perlbug-followup@perl.org> wrote:

PL_curpm does far too many things. This is not the only bug that results.

Can you expand on that thought a bit please?

Having PL_curpm point to the actual pattern in the op tree results in different recursion levels sharing the same set of match variables:

$u = "\,echle etn sJ"; $t = "\nrka rPrhoatu"; $_ = $u.$t; sub foo { s/(.)//s or return; bar(); print chop $$1 } sub bar { s/(.)//s or return; foo(); print chop $$1 } foo

(I thought I had reported this before\, but I cannot find the ticket now.)

Wow. It took me a while to work that through. :-)

I see what you mean. That is going to be a lot harder to fix than the other issues with PL_curpm IMO.

Using a qr// works.

Similarly\, having m|(?{ // })| try to use the *current* pattern instead of the last pattern to match successfully—solely due to an implementation detail—seems wrong.

Yes I agree.

PL_curpm is serving three purposes: the last match op\, the last set of matched buffers ($1 etc.)\, and the innermost pattern enclosing the currently executing code.

Yes\, I see what you mean. I think I might have put it differently\, but i get your point now for sure.

Yves

p5pRT commented 8 years ago

From @demerphq

On 19 October 2016 at 23:29\, demerphq \demerphq@gmail\.com wrote:

On 19 October 2016 at 22:51\, Father Chrysostomos via RT

Similarly\, having m|(?{ // })| try to use the *current* pattern instead of the last pattern to match successfully—solely due to an implementation detail—seems wrong.

Yes I agree.

So following up for this I put together a quick hack to make it possible to use the empty pattern in regexen\, but honestly I dont see how it is very useful\, it is all too easy to create an infinite loop (which luckily seems to be easy to detect).

./perl -le'$_="aa"; /a/; /(?{ s!!x! })/; print; //; print' xa Infinite recursion via empty pattern at -e line 1.

I personally think this is no less ugly than the previous patch\, but if you prefer it I can push it.

FWIW\, this seems to be yet another good reason to get rid of the empty pattern.

Yves

p5pRT commented 8 years ago

From @demerphq

On 20 October 2016 at 09:37\, demerphq \demerphq@gmail\.com wrote:

On 19 October 2016 at 23:29\, demerphq \demerphq@gmail\.com wrote:

On 19 October 2016 at 22:51\, Father Chrysostomos via RT

Similarly\, having m|(?{ // })| try to use the *current* pattern instead of the last pattern to match successfully—solely due to an implementation detail—seems wrong.

Yes I agree.

So following up for this I put together a quick hack to make it possible to use the empty pattern in regexen\, but honestly I dont see how it is very useful\, it is all too easy to create an infinite loop (which luckily seems to be easy to detect).

./perl -le'$_="aa"; /a/; /(?{ s!!x! })/; print; //; print' xa Infinite recursion via empty pattern at -e line 1.

I personally think this is no less ugly than the previous patch\, but if you prefer it I can push it.

FWIW\, this seems to be yet another good reason to get rid of the empty pattern.

Oh\, here is another reason:

./perl -le'$_="ab"; my $p=qr/(?{ s!!x! })/; /$p/; print; /a/; /$p/; print; /b/; /$p/; print; //;' xab xxb xxx Infinite recursion via empty pattern at -e line 1.

Crazy.

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 8 years ago

From @cpansprout

On Thu Oct 20 00:37:45 2016\, demerphq wrote:

On 19 October 2016 at 23:29\, demerphq \demerphq@gmail\.com wrote: So following up for this I put together a quick hack to make it possible to use the empty pattern in regexen\, but honestly I dont see how it is very useful\, it is all too easy to create an infinite loop (which luckily seems to be easy to detect).

./perl -le'$_="aa"; /a/; /(?{ s!!x! })/; print; //; print' xa Infinite recursion via empty pattern at -e line 1.

I personally think this is no less ugly than the previous patch\, but if you prefer it I can push it.

I would prefer it\, but I do not feel strongly about it.

FWIW\, this seems to be yet another good reason to get rid of the empty pattern.

I am still opposed to that\, unfortunately. :-)

--

Father Chrysostomos

p5pRT commented 8 years ago

From @cpansprout

On Thu Oct 20 00:42:10 2016\, demerphq wrote:

On 20 October 2016 at 09:37\, demerphq \demerphq@gmail\.com wrote:

On 19 October 2016 at 23:29\, demerphq \demerphq@gmail\.com wrote:

On 19 October 2016 at 22:51\, Father Chrysostomos via RT

Similarly\, having m|(?{ // })| try to use the *current* pattern instead of the last pattern to match successfully—solely due to an implementation detail—seems wrong.

Yes I agree.

So following up for this I put together a quick hack to make it possible to use the empty pattern in regexen\, but honestly I dont see how it is very useful\, it is all too easy to create an infinite loop (which luckily seems to be easy to detect).

./perl -le'$_="aa"; /a/; /(?{ s!!x! })/; print; //; print' xa Infinite recursion via empty pattern at -e line 1.

I personally think this is no less ugly than the previous patch\, but if you prefer it I can push it.

FWIW\, this seems to be yet another good reason to get rid of the empty pattern.

Oh\, here is another reason:

./perl -le'$_="ab"; my $p=qr/(?{ s!!x! })/; /$p/; print; /a/; /$p/; print; /b/; /$p/; print; //;' xab xxb xxx Infinite recursion via empty pattern at -e line 1.

Crazy.

I would agree that the *Perl* code there is crazy and the author of it gets what he deserves.

--

Father Chrysostomos

p5pRT commented 7 years ago

From @demerphq

On 20 October 2016 at 22:12\, Father Chrysostomos via RT \perlbug\-followup@perl\.org wrote:

On Thu Oct 20 00:37:45 2016\, demerphq wrote:

On 19 October 2016 at 23:29\, demerphq \demerphq@gmail\.com wrote: So following up for this I put together a quick hack to make it possible to use the empty pattern in regexen\, but honestly I dont see how it is very useful\, it is all too easy to create an infinite loop (which luckily seems to be easy to detect).

./perl -le'$_="aa"; /a/; /(?{ s!!x! })/; print; //; print' xa Infinite recursion via empty pattern at -e line 1.

I personally think this is no less ugly than the previous patch\, but if you prefer it I can push it.

I would prefer it\, but I do not feel strongly about it.

Well I just push a patch to meet your preference.

FWIW\, this seems to be yet another good reason to get rid of the empty pattern.

I am still opposed to that\, unfortunately. :-)

We can debate that one (again) in another thread. :-)

Yves

hvds commented 4 years ago

Looks like the final fix for this was 5585e758 (20 Oct 2016), closing.

Perl / perl5