Open p5pRT opened 9 years ago
After v5.23.1-199-ga5f4850\, building net-snmp (not Net::SNMP) with the Perl bits enabled (I guess it is --with-perl-modules --enable-embedded-perl) breaks with:
ERROR from evaluation of /wrkdirs/usr/ports/net-mgmt/net-snmp/work/net-snmp-5.7.3/perl/OID/Makefile.PL: panic: attempt to copy freed scalar 8020a16d8 to 8030eced0 at /usr/local/lib/perl5/5.23/Carp.pm line 228.
I've git bisect'ed it\, and v5.23.1-198-g0ba9d88 works fine.
The full build log for net-snmp is available there:
https://pkg.absolight.fr/data/10amd64-aragorn-perl5-devel/20150826T144538Z/logs/net-snmp-5.7.3_8.log
Attaching the log file\, in case it helps.
The RT System itself - Status changed from 'new' to 'open'
On Wed\, Aug 26\, 2015 at 06:14:09AM -0700\, via RT wrote:
After v5.23.1-199-ga5f4850\, building net-snmp (not Net::SNMP) with the Perl bits enabled (I guess it is --with-perl-modules --enable-embedded-perl) breaks with:
ERROR from evaluation of /wrkdirs/usr/ports/net-mgmt/net-snmp/work/net-snmp-5.7.3/perl/OID/Makefile.PL: panic: attempt to copy freed scalar 8020a16d8 to 8030eced0 at /usr/local/lib/perl5/5.23/Carp.pm line 228.
I've git bisect'ed it\, and v5.23.1-198-g0ba9d88 works fine.
Well this was a bit of weird one.
TL;DR: the bisect is red herring: the problem has always been there. It's a combination of poorly written Makefile.PLs in net-snmp interacting with a (more or less unfixable) design flaw in perl's internals: the stack not being ref counted. The easiest (and probably only) fix is to modify the Makefile.PLs in net-snmp.
In net-snmp the is a subdir called perl/\, which has a Makefile.PL\, and under that there are several subdirs also with Makefile.PLs\, such as perl/ASN/Makefile.PL.
Part of the build process does the equivalent of
$ cd perl/ $ perl Makefile.PL
That Makefile.PL is responsible (via lib/ExtUtils::MakeMaker) for recursively doing a chdir('ASN') and running the Makefile.PL for each subdir. However\, MakeMaker doesn't fork out and execute "perl Makefile.PL" in each subdir; instead it just does chdir('ASN'); require './Makefile.PL' within the same perl process. So all Makefile.PLs share the same namespace and are executed in a single run of the perl process.
Both perl/Makefile.PL and perl/ASN/Makefile.PL have code like the following near the top:
%MakeParams = InitMakeParams(); WriteMakefile(%MakeParams);
sub InitMakeParams { ... }
Note also that %MakeParams is a global (package) var rather than being lexical. So in perl/Makefile.PL\, %::MakeParams is assigned some values\, then WriteMakefile() is called with the contents of %MakeParams as args. At some point within the call to WriteMakefile()\, it does chdir('ASN'); require './Makefile.PL' which causes the existing contents of %::MakeParams to be freed and replaced with new elements. Since the perl arg stack (and @_) isn't ref-counted\, this leaves the @_ of the top-level call to WriteMakefile() with freed (or reassigned) elements.
You can see something similar in the following:
%h = qw(a 1 b 2); f(%h);
sub f { %h = qw(c 3); # uh oh :-( use Carp; Carp::confess(); }
which outputs:
at /home/davem/tmp/p line 10. main::f("a"\, ""\, "b"\, 3) called at /home/davem/tmp/p line 5
Note that the second arg has changed from "1" to an empty string\, due to it being prematurely freed\, and the 2 has changed to a 3.
Then there is the fact that MakeMaker has this bit of code:
if ( Carp::longmess("") =~ /runsubdirpl/s ){ carp("WARNING: Please rerun 'perl Makefile.PL' to regenerate your Makefiles\n"); }
By calling Carp::longmess()\, it creates a full backtrack\, including the function args for each level. Since some of these args have been freed or reassigned\, longmess() can sometimes crash when trying to print the value of one of those args.
The only change the bisected perl commit made is that it sometimes alters the number of temporary copies made by the list assign operator; this then affected whether particular freed SVs got reallocated\, and what they got reallocated as\, altering whether longmess() happened to crash.
The quick fix for net-snmp is to change each Makefile.PL so that either %MakeParams is a lexical var\, or that each file is run in a different namespace (e.g. 'package "ASN"') so that they don't clash with each other. Note that this would also avoid the subs like InitMakeParams() being redefined as each Makefile.PL is require'd.
-- Dave's first rule of Opera: If something needs saying\, say it: don't warble it.
On Tue\, Sep 01\, 2015 at 02:52:32PM +0100\, Dave Mitchell wrote:
interacting with a (more or less unfixable) design flaw in perl's internals: the stack not being ref counted.
Although thinking further\, I wonder whether\, for the stack not refcounted issue\, we should consider making @_ ref counted? This wouldn't solve all the issues\, but would at least protect function args from premature freeing.
It would mean an extra overhead on sub entry and exit of bumping and then unbumping the refcnt of each arg\, which is probably still a relatively small part of the function call overhead.
Another possible thing to do as well / instead would be to make the setting of @DB::args in pp_caller() smarter. At the moment it just retrieves the @_ of the appropriate call frame and memcopies the AvARRAY contents from the @_ array to the @DB::args array. Arguably it could scan the contents instead\, and replace each illegal entry (such as SvFREED\, or SvTYPE==PVAV/HV) with PL_sv_undef say. Still not good\, but would avoid panics latter in places like Carp and the debugger.
-- The Enterprise is captured by a vastly superior alien intelligence which does not put them on trial. -- Things That Never Happen in "Star Trek" #10
* Dave Mitchell \davem@​iabyn\.com [2015-09-01T09:52:32]
Well this was a bit of weird one.
Thanks for the detailed explanation\, which was useful and interesting!
-- rjbs
* Dave Mitchell \davem@​iabyn\.com [2015-09-01T11:11:43]
On Tue\, Sep 01\, 2015 at 02:52:32PM +0100\, Dave Mitchell wrote:
interacting with a (more or less unfixable) design flaw in perl's internals: the stack not being ref counted.
Although thinking further\, I wonder whether\, for the stack not refcounted issue\, we should consider making @_ ref counted? This wouldn't solve all the issues\, but would at least protect function args from premature freeing.
Both the options that you gave sound interesting and helpful\, but I worry about the actual performance impact of refcounting @_. Sub call is already a big expense that causes some performance-hungry users to go to extremes. You said it would be a "relatively small part\," but I wonder what the real numbers would look like.
-- rjbs
On Tue\, Sep 01\, 2015 at 09:14:22PM -0400\, Ricardo Signes wrote:
Both the options that you gave sound interesting and helpful\, but I worry about the actual performance impact of refcounting @_. Sub call is already a big expense that causes some performance-hungry users to go to extremes. You said it would be a "relatively small part\," but I wonder what the real numbers would look like.
I'll try this at some point and report back with real numbers.
-- Fire extinguisher (n) a device for holding open fire doors.
Le Mar 01 Sep 2015 06:53:02\, davem a écrit :
Well this was a bit of weird one.
TL;DR: the bisect is red herring: the problem has always been there. It's a combination of poorly written Makefile.PLs in net-snmp interacting with a (more or less unfixable) design flaw in perl's internals: the stack not being ref counted. The easiest (and probably only) fix is to modify the Makefile.PLs in net-snmp.
Thanks for the long explanation\, it was enlightening.
I've continued patching net-snmp to have it build with 5.23 (first fix was because of v5.23.1-46-g16d89be)
https://svnweb.freebsd.org/ports/head/net-mgmt/net-snmp/files/patch-perl5.23?view=markup
and it works again.
Now\, when doing Perl stuff\, whatever pure-Perl code I have\, I do not expect it\, ever\, to tell me that Perl is panic'ing because of some operation it's doing under the hood\, whatever stupidity I write :-) I could expect a syntax error\, or an error about doing something dangerous\, but not a panic :-)
2015-09-01 17:11 GMT+02:00 Dave Mitchell \davem@​iabyn\.com:
On Tue\, Sep 01\, 2015 at 02:52:32PM +0100\, Dave Mitchell wrote:
interacting with a (more or less unfixable) design flaw in perl's internals: the stack not being ref counted.
Although thinking further\, I wonder whether\, for the stack not refcounted issue\, we should consider making @_ ref counted? This wouldn't solve all the issues\, but would at least protect function args from premature freeing.
It would help to use proper technical wording. This already confused me once. The elements on the @_ stack *are* refcounted but weak (refcnt = 0).
You cannot unweak those elements without rewriting most XS modules out there\, which rely on mortalizing the @_ return values. That's a fundamental XS ABI design decision in perl5. You can change that in a perl7 update.
It would mean an extra overhead on sub entry and exit of bumping and then unbumping the refcnt of each arg\, which is probably still a relatively small part of the function call overhead.
No. I suspect it would lead to 10% overhead. Leaving it out won me 100%. So the @_ part is a big part of entersub\, not small. Leaving out the XS branch overhead won me only 12%.
Another possible thing to do as well / instead would be to make the setting of @DB::args in pp_caller() smarter. At the moment it just retrieves the @_ of the appropriate call frame and memcopies the AvARRAY contents from the @_ array to the @DB::args array. Arguably it could scan the contents instead\, and replace each illegal entry (such as SvFREED\, or SvTYPE==PVAV/HV) with PL_sv_undef say. Still not good\, but would avoid panics latter in places like Carp and the debugger.
-- Reini
On Mon\, Sep 07\, 2015 at 12:57:27AM +0200\, Reini Urban wrote:
2015-09-01 17:11 GMT+02:00 Dave Mitchell \davem@​iabyn\.com:
Although thinking further\, I wonder whether\, for the stack not refcounted issue\, we should consider making @_ ref counted? This wouldn't solve all the issues\, but would at least protect function args from premature freeing.
It would help to use proper technical wording. This already confused me once. The elements on the @_ stack *are* refcounted but weak (refcnt = 0).
Well\, since "weak reference" in perl already has a specific extra meaning (backref magic to make weak refs safe) it could be argued that calling them weak would also be confusing. Also\, referring to the stack as not ref counted has been pretty standard terminology on p5p for years. A quick grep -i ''stack.*not.*ref.*count' on my archived p5p mailbox shows 33 hits. The meta RT ticket for this issue is even called '[META] stack not reference counted issues'.
You cannot unweak those elements without rewriting most XS modules out there\, which rely on mortalizing the @_ return values. That's a fundamental XS ABI design decision in perl5. You can change that in a perl7 update.
I don't understand what you're referring to here. XS calls don't use @_. And having the elements of @_ ref counted is is a perfectly normal thing to happen - it's supported by the reify mechanism and can happen as easily as doing 'unshift @_\,...'. Perhaps you thought I was suggesting making the arg stack contribute to the ref count of each element? If so\, that's not what I was suggesting. I was merely suggesting the logical equivalent of of pp_entersub reifying @_ each time.
It would mean an extra overhead on sub entry and exit of bumping and then unbumping the refcnt of each arg\, which is probably still a relatively small part of the function call overhead.
No. I suspect it would lead to 10% overhead. Leaving it out won me 100%. So the @_ part is a big part of entersub\, not small.
I think you are conflating two issues: the current overhead of manipulating @_\, which includes many things such as updating *_\, checking if the array has been reified and if so abandoning it\, setting the argarray etc fields in the context structure. Then there's the extra overhead I'm suggesting\, which involves bumping the ref count of each arg on entry\, and unbumping on exit. The overhead of one (fixed per sub) has no correlation with the other (proportional to the number of args).
But as I said earlier\, at some point I will benchmark it and convert idle speculation into meaningful data. I haven't done this yet because I'm still in the process of completely overhauling the way that PUSHSUB and POPSUB etc handle @_\, the savestack etc\, which currently reduces the time taken to call an empty sub by a third.
-- "You're so sadly neglected\, and often ignored. A poor second to Belgium\, When going abroad." -- Monty Python\, "Finland"
On Sep 7\, 2015\, at 12:26 PM\, Dave Mitchell \davem@​iabyn\.com wrote:
On Mon\, Sep 07\, 2015 at 12:57:27AM +0200\, Reini Urban wrote:
2015-09-01 17:11 GMT+02:00 Dave Mitchell \davem@​iabyn\.com:
Although thinking further\, I wonder whether\, for the stack not refcounted issue\, we should consider making @_ ref counted? This wouldn't solve all the issues\, but would at least protect function args from premature freeing.
It would help to use proper technical wording. This already confused me once. The elements on the @_ stack *are* refcounted but weak (refcnt = 0).
Well\, since "weak reference" in perl already has a specific extra meaning (backref magic to make weak refs safe) it could be argued that calling them weak would also be confusing.
Yes\, probably. But weak is still weak.
Also\, referring to the stack as not ref counted has been pretty standard terminology on p5p for years. A quick grep -i ''stack.*not.*ref.*count' on my archived p5p mailbox shows 33 hits. The meta RT ticket for this issue is even called '[META] stack not reference counted issuesâ.
I know. Thatâs why I got confused all over. Itâs still wrong terminology\, and even if itâs established inside p5p circles to call the âstack" "not refcounted"\, and "weak refs" are not really weak but "backref magic" it would be better to stop with this wrong terminology in discussion. The code is fine.
Coming from the real world into the isolated perl5 circle is always a problem when people confuse terminology and come up with new names for established terminology. ruby is worse though.
You cannot unweak those elements without rewriting most XS modules out there\, which rely on mortalizing the @_ return values. That's a fundamental XS ABI design decision in perl5. You can change that in a perl7 update.
I don't understand what you're referring to here. XS calls don't use @_.
I meant SP. We have 3 stacks and a copy of it. Sorry\, now I am guilty of confusing terminology.
And having the elements of @_ ref counted is is a perfectly normal thing to happen - it's supported by the reify mechanism and can happen as easily as doing 'unshift @_\,...'. Perhaps you thought I was suggesting making the arg stack contribute to the ref count of each element? If so\, that's not what I was suggesting. I was merely suggesting the logical equivalent of of pp_entersub reifying @_ each time.
Ok. Thatâs trivial.
It would mean an extra overhead on sub entry and exit of bumping and then unbumping the refcnt of each arg\, which is probably still a relatively small part of the function call overhead.
No. I suspect it would lead to 10% overhead. Leaving it out won me 100%. So the @_ part is a big part of entersub\, not small.
I think you are conflating two issues: the current overhead of manipulating @_\, which includes many things such as updating *_\, checking if the array has been reified and if so abandoning it\, setting the argarray etc fields in the context structure. Then there's the extra overhead I'm suggesting\, which involves bumping the ref count of each arg on entry\, and unbumping on exit.
The overhead of one (fixed per sub) has no correlation with the other (proportional to the number of args).
But as I said earlier\, at some point I will benchmark it and convert idle speculation into meaningful data. I haven't done this yet because I'm still in the process of completely overhauling the way that PUSHSUB and POPSUB etc handle @_\, the savestack etc\, which currently reduces the time taken to call an empty sub by a third.
I merely suggested that you will finally stop using @_\, not making it even more slower. @_ is for the old days\, using SP and MARK directly is the way to go forward in entersub. You can study and benchmark my code\, which is based on yours. https://github.com/perl11/cperl/commits/feature/gh7-signatures3
Once you have done that and won 100%\, you can come back and steal 10% to unweaken the args.
BTW: You did the same in the regex engine where a fundamental problem was solved by merely taping over it\, making it slower twice: recursion => iteration\, s///e. perl is a duct tape language\, but the implementation should not be. And more important\, it should not get worse.
On Thu\, Sep 17\, 2015 at 03:54:02PM +0200\, Reini Urban wrote:
Also\, referring to the stack as not ref counted has been pretty standard terminology on p5p for years. A quick grep -i ''stack.*not.*ref.*count' on my archived p5p mailbox shows 33 hits. The meta RT ticket for this issue is even called '[META] stack not reference counted issuesâ.
I know. Thatâs why I got confused all over. Itâs still wrong terminology\, and even if itâs established inside p5p circles to call the âstack" "not refcounted"\, and "weak refs" are not really weak but "backref magic" it would be better to stop with this wrong terminology in discussion. The code is fine.
Coming from the real world into the isolated perl5 circle is always a problem when people confuse terminology and come up with new names for established terminology. ruby is worse though.
Well\, if I was being pedantically correct\, I would have to refer to the "perl stack contains weak rather than strong references to its elements" issue\, or to the "perl stack not contributing to the reference count of its elements" issue\, but just calling it the "perl stack not ref counted" issue seems like a convenient short-cut\, which I plan on continuing to use. In much the same way that we refer to "The Unicode Bug" rather than to "the semantics of codepoints in the range 128-255 varying depending on whether the string is utf8 encoded or not bug".
And having the elements of @_ ref counted is is a perfectly normal thing to happen - it's supported by the reify mechanism and can happen as easily as doing 'unshift @_\,...'. Perhaps you thought I was suggesting making the arg stack contribute to the ref count of each element? If so\, that's not what I was suggesting. I was merely suggesting the logical equivalent of of pp_entersub reifying @_ each time.
Ok. Thatâs trivial.
Yes\, which is rather why I suggested it.
I think you are conflating two issues: the current overhead of manipulating @_\, which includes many things such as updating *_\, checking if the array has been reified and if so abandoning it\, setting the argarray etc fields in the context structure. Then there's the extra overhead I'm suggesting\, which involves bumping the ref count of each arg on entry\, and unbumping on exit.
The overhead of one (fixed per sub) has no correlation with the other (proportional to the number of args).
But as I said earlier\, at some point I will benchmark it and convert idle speculation into meaningful data. I haven't done this yet because I'm still in the process of completely overhauling the way that PUSHSUB and POPSUB etc handle @_\, the savestack etc\, which currently reduces the time taken to call an empty sub by a third.
I merely suggested that you will finally stop using @_\, not making it even more slower. @_ is for the old days\, using SP and MARK directly is the way to go forward in entersub. You can study and benchmark my code\, which is based on yours. https://github.com/perl11/cperl/commits/feature/gh7-signatures3
Once you have done that and won 100%\, you can come back and steal 10% to unweaken the args.
Well yes\, it's always been my intention to remove the overhead of setting up @_ in the presence of a signature\, but its a change in semantics which needs discussion and consensus first. You of course have the luxury of being able to skip negotiation when working on your own private fork of perl. And I still need to worry about the 99.9% of perl code which for the next 5-10 years won't be using signatures.
Also\, you keep claiming 10%\, whereas I keep pointing out that we don't know what the figure is until we've tried it.
BTW: You did the same in the regex engine where a fundamental problem was solved by merely taping over it\, making it slower twice: recursion => iteration\, s///e. perl is a duct tape language\, but the implementation should not be. And more important\, it should not get worse.
Since this is completely unrelated to the subject at hand\, please start a new thread if you really insist on engaging in another pointless round of davem/p5p bashing.
-- My get-up-and-go just got up and went.
Migrated from rt.perl.org#125907 (status was 'open')
Searchable as RT125907$