Some closures leak memory in v5.40

mauke commented 2 months ago

Description Starting in v5.40, some closures leak memory despite there being no reference cycles.

Steps to Reproduce

#!/usr/bin/env perl
use v5.38;
use Scalar::Util qw(weaken);

my $wref;
{
    my $x;
    my $subject = sub {
        $x = $_[0];

        my $y;
        return sub { $y };
    };
    my $subscriber = {};
    weaken($wref = $subscriber);
    $subscriber->{foo} = $subject->($subscriber);
}
!defined $wref or die "Leak";

$ perl bleak.pl 
Leak at bleak.pl line 18.
$

This used to work in v5.38. It is broken in v5.40 and blead.

Expected behavior

$ perl bleak.pl
$

mauke commented 2 months ago

Bisect says:

386907f061c1812ecaa5f3c88d9f729828408097 is the first bad commit

commit 386907f061c1812ecaa5f3c88d9f729828408097
Author: Tony Cook <tony@develop-help.com>
Date:   Thu Sep 14 10:37:42 2023 +1000

    Revert "[perl #89544] Non-eval closures don’t need CvOUTSIDE"

    But they do need CvOUTSIDE() for eval-in-package-DB's scope
    magic to work correctly.

    This also incidentally fixes a TODO Deparse, since the CvOUTSIDE
    is now available

    Fixes #19370 aka rt89544

    This breaks #11286, but I don't think that's fixable without breaking
    eval-in-DB's special scope behaviour.

    This reverts (most of) commit a0d2bbd5c47035a4f7369e4fddd46b502764d86e.

 cv.h                    | 6 +-----
 dump.c                  | 1 -
 ext/Devel-Peek/t/Peek.t | 8 ++++----
 lib/B/Deparse.t         | 1 -
 pad.c                   | 4 +---
 t/op/closure.t          | 3 ++-
 t/op/eval.t             | 1 -
 7 files changed, 8 insertions(+), 16 deletions(-)

jkeenan commented 2 months ago

Bisect says:

386907f is the first bad commit

Confirmed. It went into blead on Sep 25.

commit 386907f061c1812ecaa5f3c88d9f729828408097
Author:     Tony Cook <tony@develop-help.com>
AuthorDate: Thu Sep 14 10:37:42 2023 +1000
Commit:     Tony Cook <tony@develop-help.com>
CommitDate: Mon Sep 25 10:24:07 2023 +1000

@tonycoz can you take a look? Thanks.

haarg commented 2 months ago

The script shown here detects a leak on perls before a0d2bbd5c47035a4f7369e4fddd46b502764d86e (5.16).

jkeenan commented 2 months ago

If we can fix, it should be backported to perl-5.40.1.

tonycoz commented 2 months ago

Yes, my change reverted this to pre-5.18 behaviour, since a0d2bbd broke eval in package DB, which is a documented feature.

I asked for guidance on the list and received no feedback that could resolve the issue.

The only feedback I received on the PR #21489 was to apply the reversion and consider mitigations, of which I didn't see any

Note that if your closure contains a string eval you'll still see the reference loop.

mauke commented 2 months ago

I've been thinking about this. I now think your example:

use strict;
package DB { sub myeval { eval shift or die } }
my $y = 1; # something true so eval in myeval above is true
my $has_outer = sub { $y; my $x; sub { DB::myeval '$y'; eval "";  ++$x; } }->();
$has_outer->(); # has an outside pointer and can resolve $y
my $no_outer  = sub { $y; my $x; sub { DB::myeval '$y';           ++$x; } }->();
$no_outer->(); # dies since we can't see $y

... should die in either case, no matter if there is an eval there or not. The inner sub shouldn't be able to resolve $y at runtime.

Rationale: A closure should only capture variables whose names appear lexically in the sub's body. Capturing more is not just inefficient (using more memory than expected), but can introduce leaks due to hidden reference cycles. Runtime code from eval should see the surrounding lexical scope, but if a lexical variable from an outer context is not directly referenced in the statically visible part of the code, I'd argue that it is not really part of the inner (captured) scope.

It is nice if attempts to access such a variable at runtime work anyway or trigger a Variable "$foo" is not available warning, as with this example from perldiag:

use strict;
use warnings;

sub f {
    my $x;
    sub { eval '$x' }
}
f()->();
# Variable "$x" is not available at (eval 1) line 1.

But I'm totally fine with getting an undeclared variable error from either static (lexically present in the inner sub) or dynamic (via a call to a sub in package DB) eval.

Leont commented 2 months ago

What if we add a $^P flag for this?

tonycoz commented 2 months ago

What if we add a $^P flag for this?

Then your code behaves differently in terms of references in the debugger, which makes the debugger not fit for purpose.

tonycoz commented 1 month ago

But I'm totally fine with getting an undeclared variable error from either static (lexically present in the inner sub) or dynamic (via a call to a sub in package DB) eval.

The same mechanism is used to search for lexical subs, so outer lexical subs would also fail to be found in an eval.

haarg commented 1 month ago

Given the choice between the 5.38 and 5.40 behaviors, I think I would prefer to revert to 5.38. If a closure doesn't capture a lexical, it can't expect it to be available. And this includes lexical subs. Implicitly capturing everything, leading to reference loops and lifetime issues, for the sake of the debugger seems like the wrong approach. And given how long the behavior existed before there were any complaints, it seems like it has a smaller impact.

If there was an option to use weak references, as implied by https://github.com/Perl/perl5/pull/21489#issuecomment-1725443594, that would probably be preferable to either of the behaviors that have been implemented so far.

Perl / perl5

Some closures leak memory in v5.40 #22547