Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 539 forks source link

peephole optimiser could prune more dead code #10481

Closed p5pRT closed 4 years ago

p5pRT commented 14 years ago

Migrated from rt.perl.org#76438 (status was 'open')

Searchable as RT76438$

p5pRT commented 14 years ago

From @abigail

On Tue\, Jul 13\, 2010 at 06​:55​:33PM +0100\, Ben Morrow wrote​:

Quoth nick@​ccl4.org (Nicholas Clark)​:

On Tue\, Jul 13\, 2010 at 01​:11​:11PM +0000\, ?var Arnfj?r? Bjarmason wrote​:

Of course we can't liberally change things that are documented to be undefined as liberally as a C compiler would\, becuase there's only one perl(1) but multiple cc(1)'s.

But whatever we call it\, that's the key problem. There is only one implementation\, and as that implementation strives hard to internally avoid C undefined behaviour\, its output will be deterministic\, in some fashion. Hence people come to rely on the current behaviour of the implementation\, documented or not.

Quite. And

print $i\+\+\, $i\+\+;

has DWIM forever (probably since perl 1). I'm not saying we *cannot* change it\, just that any change needs to be either only within the scope of a lexical pragma or to go through a full deprecation cycle with mandatory warnings before it changes.

Actually\, I added the explicite statement about undefinedness in the documentation\, because not all expressions containing multiple $i ++ did DWIM for everyone\, leading to lots of (pointless) discussions on what the 'right' value of complicated statements where supposed to be.

  $i = $j = 1;   print $i ++\, $i ++; # Prints 12.   print ++ $j\, ++ $j; # Prints 33. Both are really DWIM?

Abigail

p5pRT commented 14 years ago

From @abigail

On Tue\, Jul 20\, 2010 at 02​:41​:34PM -0700\, Jan Dubois wrote​:

On Tue\, 20 Jul 2010\, Nicholas Clark wrote​:

I read $a . $a as equivalent to $x . $y\, where it happens that $x and $y alias the same value. $a was *written* twice by the programmer\, so as there are two references to it\, it gets accessed *exactly* twice.

In that case I think you'll find plenty of places where it is accessed more often than you expect. E.g. ++$a might access $a once if it is just SV_pIOK\, or twice if it is just SV_pNOK\, because sv_inc() will first try to see if it can't convert the NV to an IV\, triggering an additional FETCH call​:

flags = SvFLAGS\(sv\);
if \(\(flags & \(SVp\_NOK|SVp\_IOK\)\) == SVp\_NOK\) \{
/\* It's \(privately or publicly\) a float\, but not tested as an
   integer\, so test it to see\. \*/
\(void\) SvIV\(sv\);
flags = SvFLAGS\(sv\);
\}

It is easy to guarantee that each tied variable is fetched at least once for each time it is mentioned in the source code\, but it is extremely hard to guarantee that it isn't called more often​: Any innocent looking SvIV()\, SvNV() or SvPV() call anywhere in the core may trigger an additional call to FETCH a tied variable.

I prefer to view this as an inefficiency\, not as a bug\, because I think FETCH should be side-effect free.

If FETCH is to be side-effect free\, many of the interesting cases for using ties disappear.

Abigail

p5pRT commented 14 years ago

From @rgarcia

On 29 July 2010 15​:42\, Abigail \abigail@​abigail\.be wrote​:

I prefer to view this as an inefficiency\, not as a bug\, because I think FETCH should be side-effect free.

If FETCH is to be side-effect free\, many of the interesting cases for using ties disappear.

At this point\, someone usually mentions monads.

p5pRT commented 14 years ago

From @ap

* Eric Brine \ikegami@​adaelis\.com [2010-07-12 06​:55]​:

On Sun\, Jul 11\, 2010 at 10​:22 PM\, David Golden \xdaveg@​gmail\.com wrote​:

I'm suggesting that we disclaim any implicit guarantee that the compiler won't optimize away expressions that have side effects when evaluated.

Without that guarantee\,

my $x = f() or DEBUG && warn(...); return $x;

would be buggy. Dunno if that matters

Why?

As far as I can tell\, the compiler would statically determine that it can optimise the `DEBUG && warn` part down to `!1` when `DEBUG` is false. This seems correct to me.

It would also statically determine that an `or !1` clause in that statement does nothing\, and so fold it away altogether.

That would leave the code looking like

  my $x = f();   return $x;

when `DEBUG` is false.

Which seems 100% on the mark to me.

Am I missing something in your objection?

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 14 years ago

From @ikegami

Indeed. I made a booboo when I wrote that.

p5pRT commented 11 years ago

From @nthykier

Hi\,

Attached is a prototype branch that enables some branch elimination while retaining possible side-effects OPs.

It can kill branches in if-else cases. Examples​:

  $ ./perl -Ilib -MO=Deparse \   -e 'if ($a && "Pie" eq "Good") {print "True\n" }'   -e ' else { print "False\n" }'   unless ($a and !1) {   print "False\n";   }

  $ ./perl -Ilib -MO=Deparse \   -e 'if ($a || "Pie" ne "Good") {print "True\n" }'   -e ' else { print "False\n" }'   if ($a or 1) {   print "True\n";   }

The patch adds two op_private flags for OP_AND and OP_OR\, which determines if the expression will always evaluate to either TRUE or FALSE. If newCONDOP detects its condition is one of these two OPs and they have one of these flags set\, it will kill the dead branch.

However\, the newLOGOP does not use this flag itself to elimite dead branches. Accordingly\, the original test case is still​:

  $ ./perl -Ilib -MO=Deparse -e 'if ($a && "Pie" ne "Good") {print}'   if ($a and 1) {   print $_;   }

The only problem here is that folding the OP_AND/OP_OR expressions should probably be deferred to peep optimisation. Otherwise\, we might fold​:

  $a && "Pie" ne "Good" => $a

Before newCONP has a change to see it (and thus lose the ability to eliminate the dead branch).

~Niels

p5pRT commented 11 years ago

From @nthykier

op-prune-cond-branch.diff ```diff diff --git a/op.c b/op.c index d5323a0..a5675d3 100644 --- a/op.c +++ b/op.c @@ -5976,6 +5976,39 @@ S_new_logop(pTHX_ I32 type, I32 flags, OP** firstp, OP** otherp) first->op_next = (OP*)logop; first->op_sibling = other; + if (type == OP_AND || type == OP_OR) + { + /* Look for stuff like: a() and (b() or 1) + + The truth value of this expression is entirely decidable, + but we cannot eliminate the expression (a() and b() might + have side effects). + */ + U8 rhs_value = 0; + cstop = search_const(other); + /* Check if the RHS is known to be FALSE */ + if (cstop) + { + if (SvTRUE(((SVOP*)cstop)->op_sv)) + rhs_value = OPpLOGOP_CONST_TRUE; + else + rhs_value = OPpLOGOP_CONST_FALSE; + } + if ((other->op_type == OP_AND || other->op_type == OP_OR) + && (other->op_private & OPpLOGOP_CONST_MASK)) + { + rhs_value = other->op_private & OPpLOGOP_CONST_MASK; + } + + if (rhs_value) + { + if (type == OP_AND && rhs_value == OPpLOGOP_CONST_FALSE) + logop->op_private |= rhs_value; + if (type == OP_OR && rhs_value == OPpLOGOP_CONST_TRUE) + logop->op_private |= rhs_value; + } + } + CHECKOP(type,logop); o = newUNOP(prepend_not ? OP_NOT : OP_NULL, 0, (OP*)logop); @@ -6043,6 +6076,47 @@ Perl_newCONDOP(pTHX_ I32 flags, OP *first, OP *trueop, OP *falseop) live->op_private |= OPpCONST_FOLDED; return live; } + o = first; + while (o && o->op_type == OP_NULL && (o->op_flags & OPf_KIDS)) + o = cUNOPo->op_first; + + if (o && (o->op_type == OP_AND || o->op_type == OP_OR) + && (o->op_private & OPpLOGOP_CONST_MASK)) { + const bool left = (o->op_private & OPpLOGOP_CONST_MASK) + == OPpLOGOP_CONST_TRUE; + OP *live = left ? trueop : falseop; + OP *const dead = left ? falseop : trueop; + /* Rewrite: + + if (a() and ) {} else { b() } => + (a() and ) or { b() } + + if (a() or ) { b() } else {} => + (a() or ) and { b() } + + (NB: and are not limited to simple OP_CONST) + + */ + OPCODE combine_op = o->op_type == OP_AND ? OP_OR : OP_AND; + + if (PL_madskills) { + /* This is all dead code when PERL_MAD is not defined. */ + live = newUNOP(OP_NULL, 0, live); + op_getmad(first, live, 'C'); + op_getmad(dead, live, left ? 'e' : 't'); + } else { + op_free(dead); + } + if (live->op_type == OP_LEAVE) + live = newUNOP(OP_NULL, OPf_SPECIAL, live); + else if (live->op_type == OP_MATCH || live->op_type == OP_SUBST + || live->op_type == OP_TRANS || live->op_type == OP_TRANSR) + /* Mark the op as being unbindable with =~ */ + live->op_flags |= OPf_SPECIAL; + else if (live->op_type == OP_CONST) + live->op_private |= OPpCONST_FOLDED; + return newLOGOP(combine_op, 0, first, live); + } NewOp(1101, logop, 1, LOGOP); logop->op_type = OP_COND_EXPR; logop->op_ppaddr = PL_ppaddr[OP_COND_EXPR]; diff --git a/op.h b/op.h index 5d1a771..b055965 100644 --- a/op.h +++ b/op.h @@ -176,6 +176,12 @@ Deprecated. Use C instead. #define OPpASSIGN_BACKWARDS 64 /* Left & right switched. */ #define OPpASSIGN_CV_TO_GV 128 /* Possible optimisation for constants. */ + +/* Private for OP_AND and OP_OR */ +#define OPpLOGOP_CONST_TRUE 2 /* Result is always true, but op has side-effect*/ +#define OPpLOGOP_CONST_FALSE 4 /* Result is always false, but op has side-effect*/ +#define OPpLOGOP_CONST_MASK (2|4) /* Mask is always true, but op has side-effect*/ + /* Private for OP_MATCH and OP_SUBST{,CONT} */ #define OPpRUNTIME 64 /* Pattern coming in on the stack */ ```
toddr commented 4 years ago

Are we planning on applying this? If not should it be closed?

iabyn commented 4 years ago

On Tue, Feb 04, 2020 at 02:20:43PM -0800, Todd Rinaldo wrote:

Are we planning on applying this? If not should it be closed?

I doubt that it would apply cleanly any more.

But in any case, it just removes a bit of dead code from the optree; it doesn't optimise runtime execution. Since also the whole area of boolean and logical op optimisation is a bit complex, we'd run a real risk of inadvertently pessimising something too.

So I don't think the gain is worth the effort to revive and scrutinise this patch.

-- "Do not dabble in paradox, Edward, it puts you in danger of fortuitous wit." -- Lady Croom, "Arcadia"