Closed brummett closed 6 years ago
See also
https://rt.perl.org/Ticket/Display.html?id=133271
"Blead Breaks CPAN: BRUMMETT/Devel-hdb-0.23.tar.gz"
I haven't been able to diagnose this fully (and thus decide whether perl or something else needs fixing) due to the complex way in which perl debugging infrastructure is launched and interacted with under Devel::hdb.
If you can reduce the bug to a simple, self-contained example that I can reproduce and run under gdb, I'll have a further look at it from the perl end of things.
What I've found so far is:
1) Building perl with clang and address sanitizer gives a consistent failure of t/04-get-all-breakpoints.t under bleadperl. The perl process running under -d:hdb is crashing. The crash is happening in the set magic for a %dbline element (Perl_magic_setdbline()), where pointers to ops are stored as the IV value in the magic's object SV; in this particular case, the op pointer is a pointer to an op that has already been freed.
The set magic is triggered by this line in Devel/Chitin/Actionable.pm:
sub _insert {
...
my $bp_info = $dbline{$self->line} ||= {};
where $self->line is 3 and $self->file is t/TestNothing.pm. In this case it's assigning {} to $dbline{3}.
The op it's trying to update was created during the compilation of a require'd file, and the execution of the require has already finished and its associated op tree freed.
Whether perl has freed the optree too soon, or whether hdb/Chitin is trying modify part of %dline that it shouldn't be, I don't know.
I haven't been able to reproduce it. It seems like it's only happening when I submit a distribution and it runs through cpantesters. I don't think I've ever seen it happen when the tests run on TravisCI or locally on any of the machines I have access to. I've made some attempts to get more info out of failed tests by adding more print statements for debugging without much luck.
There are several similar failures I've cataloged in the Devel::Chitin repo: https://github.com/brummett/Devel-Chitin/issues?q=is%3Aissue+is%3Aopen+label%3Arandom
The fact that it happens so randomly, not on any particular Perl version, Devel::hdb/Chitin version or architecture makes it seem a lot like a race condition somewhere internal to perl.
You've given me somewhere concrete to look. I'll try to make a concise test case for this problem.
I've made a much smaller test case that seems to fail in the same way. Put these files into a directory. quick-test.pl:
use ModuleToLoad;
print "2\n";
foo();
sub foo {
print "6\n";
}
print "Done!\n";
ModuleToLoad.pm:
package ModuleToLoad;
print "ModuleToLoad 3\n";
sub a_sub {
print "ModuleToLoad 6\n";
print "ModuleToLoad 7\n";
}
1;
and Devel/Dbg.pm:
package Dbg;
use strict;
use warnings;
use base qw(Devel::Chitin);
Dbg->attach();
sub init {
print STDERR "Setting bp for ModuleToLoad line 3\n";
my($module_filename) = grep { m/ModuleToLoad.pm/ } Dbg->loaded_files();
Devel::Chitin::Breakpoint->new( file => $module_filename, line => 3);
}
sub notify_stopped {
Dbg->continue;
}
1;
And then run quick-test.pl in a tight loop:
while [ $? -eq 0 ]; do perl -I/path/to/Devel-Chitin/lib -I. -d:Dbg quick-test.pl; done
It'll eventually segfault at some random time after the breakpoint at ModuleToLoad.pm:3 is set. Sometimes immediately after setting the BP, sometimes during global destruction, sometimes somewhere in between. It never seems to crash before the breakpoint is set, or if I change the breakpoint to a line within ModuleToLoad::a_sub(). So, it looks like it has something to do with setting a breakpoint on a line in the top-level scope of a module loaded at compile-time.
The point of the test was just to set a breakpoint in something other than the main file, so I can change it to break in the subroutine instead of the top-level scope. I might be able to change Devel::Chitin to refuse to set a BP in the top-level scope of a module and avoid this problem.
If you're not able to track down where the bad pointer originates, and since this top-scope code has already run and can't ever run again, maybe an acceptable workaround would be for @{"_<$filename"}
to indicate that these lines aren't breakable.
On Tue, Jun 19, 2018 at 02:40:45PM +0000, Anthony Brummett wrote:
I've made a much smaller test case that seems to fail in the same way. Put these files into a directory.
Thanks for the reduced test case, that helped me pinpoint the issue. Using valgrind, I can now reliably reproduce the problem. Indeed, I can reproduce it using perl's standard debugger, so it's not an issue with your modules.
$ cat Foo.pm
package Foo;
$x = 1;
1;
$ cat foo
use Foo;
$x = 2;
$ valgrind ./perl -Ilib -I. -d foo
....
DB<1> b Foo.pm:2
==19008== Invalid read of size 1
==19008== at 0x5AECF3: Perl_magic_setdbline (mg.c:2140)
....
The general issue is in trying to set a breakpoint on a line in the main body (as opposed to within a subroutine) of a require'd Foo.pm file. After Foo.pm has been compiled and executed, all its main body ops are thrown away (require is just a glorified eval, after all). However, %dbline still contains IVs which have values which are 'secret', non-refcounted pointers back to those ops (which have since been freed). So any attempt to set a breakpoint tries to set a flag on a dbstate op which has either been freed or reallocated as something else (and so crashes ensue).
From your perspective, you can avoid random crashes in your test suite by avoiding set a breakpoint on a line in a Foo.pm file which has already been executed.
It looks like a more general fix will need to be in perl itself. although I haven't got a clue how.
I'll add what I've written here to the open perl ticket too.
-- Indomitable in retreat, invincible in advance, insufferable in victory -- Churchill on Montgomery
I've merged the fix for the test. I'll make a test release to make sure it's fixed before I resolve this issue.
Trial release looks good. Thanks for the help with tracking this down.
CPAN testers randomly fails t/04-get-all-breakpoints.t:
The call to
$client->get_breakpoints()
on line 42 is what's causing the exception. Maybe the child process is dead for some reason?