Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 541 forks source link

SV *Perl_cv_const_sv_or_av(const CV *const): Assertion `((svtype)((cv)->sv_flags & 0xff)) == SVt_PVCV || ((svtype)((cv)->sv_flags & 0xff)) == SVt_PVFM' failed (op.c:7926) #15548

Open p5pRT opened 8 years ago

p5pRT commented 8 years ago

Migrated from rt.perl.org#129068 (status was 'open')

Searchable as RT129068$

p5pRT commented 8 years ago

From @geeknik

v5.25.4-5-g92d73bf

./perl -e 'my __PACKAGE__(&p0000;0;p0000'

perl​: op.c​:7926​: Perl_cv_const_sv_or_av​: Assertion `((svtype)((cv)->sv_flags & 0xff)) == SVt_PVCV || ((svtype)((cv)->sv_flags & 0xff)) == SVt_PVFM' failed.

Program received signal SIGABRT\, Aborted. 0x00007ffff6cf2067 in __GI_raise (sig=sig@​entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c​:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c​: No such file or directory. (gdb) bt #0 0x00007ffff6cf2067 in __GI_raise (sig=sig@​entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c​:56 #1 0x00007ffff6cf3448 in __GI_abort () at abort.c​:89 #2 0x00007ffff6ceb266 in __assert_fail_base (fmt=0x7ffff6e24238 "%s%s%s​:%u​: %s%sAssertion `%s' failed.\n%n"\,   assertion=assertion@​entry=0x6023e8 "((svtype)((cv)->sv_flags & 0xff)) == SVt_PVCV || ((svtype)((cv)->sv_flags & 0xff)) == SVt_PVFM"\,   file=file@​entry=0x6c0ae2 "op.c"\, line=line@​entry=7926\,   function=function@​entry=0x6157b0 \<__PRETTY_FUNCTION__.18188> "Perl_cv_const_sv_or_av") at assert.c​:92 #3 0x00007ffff6ceb312 in __GI___assert_fail (   assertion=assertion@​entry=0x6023e8 "((svtype)((cv)->sv_flags & 0xff)) == SVt_PVCV || ((svtype)((cv)->sv_flags & 0xff)) == SVt_PVFM"\,   file=file@​entry=0x6c0ae2 "op.c"\, line=line@​entry=7926\,   function=function@​entry=0x6157b0 \<__PRETTY_FUNCTION__.18188> "Perl_cv_const_sv_or_av") at assert.c​:101 #4 0x0000000000426f1d in Perl_cv_const_sv_or_av (cv=\) at op.c​:7926 #5 0x0000000000476323 in Perl_yylex () at toke.c​:7154 #6 0x000000000048a223 in Perl_yyparse (gramtype=103) at perly.c​:334 #7 0x0000000000450cc8 in S_parse_body (env=env@​entry=0x0\, xsinit=xsinit@​entry=0x421940 \<xs_init>) at perl.c​:2373 #8 0x000000000045285d in perl_parse (my_perl=\\, xsinit=xsinit@​entry=0x421940 \<xs_init>\, argc=3\, argv=0x7fffffffe6a8\,   env=env@​entry=0x0) at perl.c​:1689 #9 0x00000000004217b0 in main (argc=3\, argv=0x7fffffffe6a8\, env=0x7fffffffe6c8) at perlmain.c​:121

p5pRT commented 7 years ago

From zefram@fysh.org

Brian Carpenter wrote​:

./perl -e 'my __PACKAGE__(&p0000;0;p0000'

Reduces to 'my main(&z;0;z'. This failure mode raises wider issues that need to be decided in order to determine how to fix it. It's all about how my(...) lists are parsed.

Where a single item is being lexicalised\, as in "my $x"\, the item is syntactically required to be a scalar\, array\, or hash. Thus "my &z" is rejected early on. But where a parenthesised list of items is being lexicalised\, the syntax permits the parens to contain any expression whatsoever. The restriction on what can be lexicalised is instead implemented by walking the optree of the completed list\, checking that it semantically only contains acceptable items. Hence this difference in diagnostics​:

$ ./perl -le 'my &z' syntax error at -e line 1\, near "my &z" Execution of -e aborted due to compilation errors. $ ./perl -le 'my(&z)' Can't declare subroutine entry in "my" at -e line 1\, at EOF Execution of -e aborted due to compilation errors.

Actually it's a little more complicated because things like "my $$p" are syntactically valid but subject to the same sematic check. I haven't managed to do anything really interesting with that\, so I won't consider it further.

If the list is well behaved\, the difference in the mode of checking doesn't matter. But it's possible for a carefully crafted list to sneak contraband past the semantic check\, where the syntactic check would have caught it. The danger in skipping the check is that things that were syntactically mentioned in the list are declared as lexicals even if they are not valid for this purpose. They get added to the pad optimistically\, by the lexer\, while in a my(...) list\, with the lexer relying on the parser's checks to reject invalid stuff before any code can be affected by the declaration. In particular\, "&z" is good enough to get this provisional declaration behaviour\, even though it's never valid in this kind of "my" expression.

In the case with which this ticket is concerned (and using my minimised form of it)\, the partial list "(&z" is enough to get a pad entry named "&z". Omitting the closing paren succeeds in skipping the semantic check\, because the list is never syntactically complete. The later instance of "z" is then looked up in the pad\, which is a problem because it's not a fully-formed lexical sub. The lexer wants to see whether it's a constant sub\, but it picks up the type-constraining stash instead of an actual CV\, leading to the assertion failure. This failure mode is rather tricky\, because skipping the semantic check came at the cost of creating a syntax error\, so the details are tied up in the parser's error recovery.

A more enlightening way to skip the semantic check is to use a conditional expression with constant condition. The false branch of the conditional doesn't appear in the optree\, and so doesn't get checked for acceptability\, but the lexer saw it. Putting aside the lexical-sub case for a minute\, consider what mischief we can get up to with just scalars. What should this program output​:

  sub foo {   my (1 ? $x : $y);   $y++;   print $y;   }   $y = 5;   foo; foo;

If the lexical declaration were merely "my ($x)" then all instances of $y would refer to the package variable\, and we'd get "67". If it were "my ($x\, $y)" then we'd get "11"\, incrementing $y from undef afresh on each call. What we actually get is "12". The "my" has the effect of declaring $y as a lexical\, so the later uses of $y in the sub refer to the lexical. But with $y having been elided from the optree of the "my" expression\, it doesn't get reset for each call. The lexical declaration acts as "my $x; my $y if 0"\, but doesn't get the deprecation warning that "if 0" provokes.

The deparser doesn't handle this situation. It goes by the optree\, and so the "my (1 ? $x : $y)" is emitted as "my $x"\, but the later references in the sub to the lexical $y are still emitted as "$y". Compiling the resulting code loses the lexicalness of $y. The same problem happens with "my $y if 0"\, except that in that case the deparser emits "'???'"\, at least showing that something was optimised out.

Getting back to "&z"\, we can use the conditional to evoke the same failure mode without a syntax error​:

$ perl -le 'my main (1 ? $x : &z); z' perl​: op.c​:8062​: Perl_cv_const_sv_or_av​: Assertion `((svtype)((cv)->sv_flags & 0xff)) == SVt_PVCV || ((svtype)((cv)->sv_flags & 0xff)) == SVt_PVFM' failed. Abort

We can also get a related failure by omitting the type restriction​:

$ perl -le 'my (1 ? $x : &z); z' perl​: pp.c​:183​: Perl_pp_clonecv​: Assertion `protocv' failed. Abort

There are even failures without any subsequent reference to the broken lexical sub​:

$ perl -le 'my (1 ? $x : &z);' perl​: pp.c​:183​: Perl_pp_clonecv​: Assertion `protocv' failed. Abort $ perl -le 'my main (1 ? $x : &z);' Segmentation fault

Note that the latter SEGVs even in a debugging build\, rather than asserting.

We need to decide how to treat these conditionals in "my" lists. Determine it for the (relatively) easy cases\, and that will point to how to handle the tricky case with which this ticket started.

-zefram

p5pRT commented 7 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 7 years ago

From @iabyn

On Sun\, Jan 29\, 2017 at 02​:27​:31AM +0000\, Zefram wrote​:

Where a single item is being lexicalised\, as in "my $x"\, the item is syntactically required to be a scalar\, array\, or hash. Thus "my &z" is rejected early on. But where a parenthesised list of items is being lexicalised\, the syntax permits the parens to contain any expression whatsoever. The restriction on what can be lexicalised is instead implemented by walking the optree of the completed list\, checking that it semantically only contains acceptable items. Hence this difference in diagnostics​:

$ ./perl -le 'my &z' syntax error at -e line 1\, near "my &z" Execution of -e aborted due to compilation errors. $ ./perl -le 'my(&z)' Can't declare subroutine entry in "my" at -e line 1\, at EOF Execution of -e aborted due to compilation errors.

Is there any reason why a my() list can't be enforced (in a strict manner) by the lexer / grammar itself rather than post-hoc by optree inspection?

-- "Foul and greedy Dwarf - you have eaten the last candle."   -- "Hordes of the Things"\, BBC Radio.

p5pRT commented 7 years ago

From zefram@fysh.org

Dave Mitchell wrote​:

Is there any reason why a my() list can't be enforced (in a strict manner) by the lexer / grammar itself rather than post-hoc by optree inspection?

If we were building this from scratch then grammatical enforcement would clearly be a good idea\, with only the minor downside that we need to duplicate the productions for list syntax. Having already got into the present state\, though\, we can't just casually change it. It's still a good direction to go in\, but we'd need a deprecation cycle\, in case anyone is using syntactically strange stuff to legitimate effect. I can't imagine that there's any usage in that category that we'd actually want to preserve.

A deprecation to narrow a grammatical production like this is a tricky thing to arrange. perl is not able to retry parsing of arbritrary code. I fear that we would need to duplicate a lot of the expression productions\, putting deprecation warnings on all the productions other than the small group of approved items. It would be way more hassle than the qw-as-list deprecation of a few years ago.

A grammatical restriction on the list content would sort out these issues with conditionals and subroutines. There might remain issues with being able to sneak stuff into single "my" items\, although I wasn't able to find such a problem. If one turns up\, we could do a similar deprecation to narrow the syntax of what's permitted in each "my" item.

-zefram

p5pRT commented 7 years ago

From @cpansprout

On Tue\, 28 Mar 2017 16​:17​:43 -0700\, zefram@​fysh.org wrote​:

Dave Mitchell wrote​:

Is there any reason why a my() list can't be enforced (in a strict manner) by the lexer / grammar itself rather than post-hoc by optree inspection?

If we were building this from scratch then grammatical enforcement would clearly be a good idea\, with only the minor downside that we need to duplicate the productions for list syntax. Having already got into the present state\, though\, we can't just casually change it. It's still a good direction to go in\, but we'd need a deprecation cycle\, in case anyone is using syntactically strange stuff to legitimate effect. I can't imagine that there's any usage in that category that we'd actually want to preserve.

One that I can think of is unary plus. It’s not likely to occur often\, but it is often used for disambiguation and could conceivably end up in a my() list after refactoring (and somebody forgot to remove the +\, but the code worked anyway).

Yes\, this is only theoretical. I don‘t know whether it is worth preserving that. my +$x is an error.

--

Father Chrysostomos

p5pRT commented 7 years ago

From zefram@fysh.org

Father Chrysostomos via RT wrote​:

One that I can think of is unary plus.

Ah\, indeed\, that could appear in a legitimate program. It's not a usage to preserve\, there being no need for such disambiguation in this context. (If we did want to preserve it\, for consistency a similar unary plus ought to be permitted on parameters declared in subroutine signatures.)

A related usage is nested parens. These are legal\, don't break anything\, and could arise by accident from refactoring. But\, like unary plus\, they also don't achieve anything in the "my" context\, and it's not worth the complexity of preserving their permissibility.

-zefram

p5pRT commented 7 years ago

From @iabyn

On Wed\, Mar 29\, 2017 at 03​:27​:41AM +0100\, Zefram wrote​:

Father Chrysostomos via RT wrote​:

One that I can think of is unary plus.

Ah\, indeed\, that could appear in a legitimate program. It's not a usage to preserve\, there being no need for such disambiguation in this context. (If we did want to preserve it\, for consistency a similar unary plus ought to be permitted on parameters declared in subroutine signatures.)

A related usage is nested parens. These are legal\, don't break anything\, and could arise by accident from refactoring. But\, like unary plus\, they also don't achieve anything in the "my" context\, and it's not worth the complexity of preserving their permissibility.

Is there any reason we couldn't just add a check to Perl_localize\, Perl_my_attrs (ant maybe a few other places)\, which gives a deprecation warning if o isn't of one or two simple forms (like list/pushmark/(pad[sah]v x n) ?

-- This is a great day for France!   -- Nixon at Charles De Gaulle's funeral

p5pRT commented 7 years ago

From zefram@fysh.org

Dave Mitchell wrote​:

Is there any reason we couldn't just add a check to Perl_localize\, Perl_my_attrs (ant maybe a few other places)\, which gives a deprecation warning if o isn't of one or two simple forms (like list/pushmark/(pad[sah]v x n) ?

That's effectively what we've already got. It's an error rather than a warning\, and the check is in S_my_kid() which is called from Perl_my_attrs(). This ticket is concerned with items that have problematic effects when lexed in a "my" list but duck this semantic check (by causing a parse error or by not leaving any evidence in the optree).

-zefram

demerphq commented 2 years ago

The examples zefram provided still segfault or assert in 5.37.4 and 5.34.1.