t/harness.t creates untracked core dump on FreeBSD et al. but not on Linux

jkeenan commented 1 year ago

This ticket continues the discussion begun in the Perl issue tracker at https://github.com/Perl/perl5/issues/21455. I'm moving the discussion here because:

My analysis indicates that the problem lies in Test-Harness's own test suite and is merely reproduced when Test-Harness is synched into core; and
Test-Harness's documentation on CPAN indicates that bug tickets should be filed on RT, but that queue consists of 62 old reports and I suspect filing here in the Perl-Toolchain-Gang organization will get more attention. (If you want me to file on RT instead, just let me know.)

The discussion so far

Briefly, the discussion in https://github.com/Perl/perl5/issues/21455: For the last month, when I configure, build and test Perl on FreeBSD (and other OSes) and then call git status at the end of the test suite, the status output includes:

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    cpan/Test-Harness/perl.core

This core dump does not cause any test failures and appears to be compiler-independent, but is nonetheless real:

-rw-------  1 jkeenan  jkeenan  11415552 Sep  5 21:12 cpan-Test-Harness.freebsd.clang14.perl.core

I do not observe this core dump on Linux: neither by ls nor by git status.

I haven't definitively bisected this problem to see when it entered the Perl 5 core distribution, but I strongly suspect that it occurred when I synched Test-Harness-3.46 into core on August 13.

commit 6e0b8bd69336ecc95a6ca480c4b2ab58c0080bc2
Author:     Leon Timmermans <fawaka@gmail.com>
AuthorDate: Sat Aug 12 17:05:48 2023 -0400
Commit:     James E Keenan <jkeenan@cpan.org>
CommitDate: Sun Aug 13 06:56:43 2023 -0400

    cpan/Test-Harness - Update to version 3.46

I performed that synching on Linux. All tests passed. git status was clean, so I had no reason to repeat that process on any other OS. If I weren't regularly building and testing on FreeBSD-13, I wouldn't have stumbled across this problem. My expectation is that on any (non-Windows, at least) OS to which I have access, after I run perl's make test_harness, all tests should PASS and git status should show no untracked files.

Research in the Test-Harness repository on GitHub

I brought my own fork of Test-Harness on GitHub up-to-date with PTC's repository, then created a branch so that I could isolate the code generating the segfault in a way by which I could compare Linux and FreeBSD output more precisely. That 'segfault-analysis-20230908' branch is here.

I acknowledge that I have rarely peered into Test-Harness's code; hence, there may be weaknesses in the diagnostic code I added in my branch. All my changes are confined to t/harness.t:

$ git diff -w master..segfault-analysis-20230908 -- t/ |cat
diff --git a/t/harness.t b/t/harness.t
index b84fa27..e81f23f 100644
--- a/t/harness.t
+++ b/t/harness.t
@@ -6,6 +6,7 @@ BEGIN {

 use strict;
 use warnings;
+use Data::Dumper;

 use Test::More;
 use IO::c55Capture;
@@ -539,12 +540,39 @@ for my $test_args ( get_arg_sets() ) {
         skip "ASAN doesn't passthrough SEGV", 1
           if "$Config{cc} $Config{ccflags} $Config{optimize}" =~ /-fsanitize\b/;

+print STDERR "XXX:\n";
+warn "XXX1: perl.core file exists" if (-e './perl.core');
+
         @output = ();
-        _runtests( $harness_failures, "$sample_tests/segfault" );
+
+        # The following line creates './perl.core' on FreeBSD:
+        #
+        # _runtests( $harness_failures, "$sample_tests/segfault" );
+        #
+        # Our objective is to isolate the code generating the segfault so that
+        # we can understand why it (seemingly) creates a core dump on FreeBSD
+        # et al. but not on Linux.
+
+print STDERR "FFF1: \$harness_failures is a ", ref $harness_failures, "\n";
+print STDERR "FFF2: ", scalar keys (%$harness_failures), " elements in \$harness_failures\n";
+
+        # First, run only the TAP::Harness objects seen so far:
+        _runtests( $harness_failures );
+warn "XXX2: perl.core file exists" if (-e './perl.core');
+
+        # Next, run the segfault test:
+        # First argument to _runtests must be a TAP::Harness object (I think)
+
+        my $this_harness = $HARNESS->new;
+        _runtests( $this_harness, "$sample_tests/segfault" );
+warn "XXX3: perl.core file exists" if (-e './perl.core');
+
+print STDERR Dumper(\@output);

         my $out_str = join q<>, @output;

         like( $out_str, qr<SEGV>, 'SIGSEGV is parsed out' );
+print STDERR "YYY:\n";
     }

     #XXXX

All the changes are debugging code except splitting _runtests( $harness_failures, "$sample_tests/segfault" ); into two separate invocations of _runtests(), for the second of which I had to create a new TAP::Harness object.

Results

If I build Test::Harness on each of Linux and FreeBSD and exercise the file generating the segfault on FreeBSD, I get these comparative results:

Ubuntu Linux 22.04 LTS at release-3.45_01-12-gffc5f49

$ perl Makefile.PL && make
...
Manifying 17 pod documents

$ rm perl.core; prove -b t/harness.t; ls -ltr . | tail
rm: cannot remove 'perl.core': No such file or directory
t/harness.t .. 1/133 XXX:
FFF1: $harness_failures is a TAP::Harness
FFF2: 12 elements in $harness_failures
$VAR1 = [
          'Files=0, Tests=0,  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)',
          'Result: NOTESTS',
          't/sample-tests/segfault ..',
          'No subtests run',
          'Test Summary Report',
          '-------------------',
          't/sample-tests/segfault (Wstat: 139 (Signal: SEGV, dumped core) Tests: 0 Failed: 0)',
          'Non-zero wait status: 139',
          'Parse errors: No plan found in TAP output',
          'Files=1, Tests=0,  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)',
          'Result: FAIL'
        ];
YYY:
t/harness.t .. ok       
All tests successful.
Files=1, Tests=133,  0 wallclock secs ( 0.01 usr  0.00 sys +  0.07 cusr  0.03 csys =  0.11 CPU)
Result: PASS
-rw-rw-r-- 1 jkeenan jkeenan  2244 Aug 12 16:45 Makefile.PL
-rw-rw-r-- 1 jkeenan jkeenan  6346 Aug 12 16:45 HACKING.pod
drwxrwxr-x 2 jkeenan jkeenan  4096 Aug 12 16:45 bin
-rw-rw-r-- 1 jkeenan jkeenan 44232 Sep  8 17:17 Changes
-rw-rw-r-- 1 jkeenan jkeenan   774 Sep  9 13:10 MYMETA.yml
-rw-rw-r-- 1 jkeenan jkeenan  1295 Sep  9 13:10 MYMETA.json
-rw-r--r-- 1 jkeenan jkeenan 45309 Sep  9 13:10 Makefile
drwxrwxr-x 8 jkeenan jkeenan  4096 Sep  9 13:10 blib
-rw-rw-r-- 1 jkeenan jkeenan     0 Sep  9 13:10 pm_to_blib
drwxrwxr-x 9 jkeenan jkeenan  4096 Sep  9 13:10 t

FreeBSD-13 at release-3.45_01-12-gffc5f49

$ perl Makefile.PL && make
...
Manifying 50 pod documents

$ rm perl.core; prove -b t/harness.t; ls -ltr . | tail
rm: perl.core: No such file or directory
t/harness.t .. 1/133 XXX:
FFF1: $harness_failures is a TAP::Harness
FFF2: 12 elements in $harness_failures
XXX3: perl.core file exists at t/harness.t line 568.
$VAR1 = [
          'Files=0, Tests=0,  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)',
          'Result: NOTESTS',
          't/sample-tests/segfault ..',
          'No subtests run',
          'Test Summary Report',
          '-------------------',
          't/sample-tests/segfault (Wstat: 139 (Signal: SEGV, dumped core) Tests: 0 Failed: 0)',
          'Non-zero wait status: 139',
          'Parse errors: No plan found in TAP output',
          'Files=1, Tests=0,  0 wallclock secs ( 0.00 usr  0.01 sys +  0.00 cusr  0.01 csys =  0.02 CPU)',
          'Result: FAIL'
        ];
YYY:
t/harness.t .. ok       
All tests successful.
Files=1, Tests=133,  0 wallclock secs ( 0.04 usr  0.01 sys +  0.20 cusr  0.05 csys =  0.30 CPU)
Result: PASS
drwxr-xr-x  3 jkeenan  jkeenan         3 Sep  8 21:21 reference
drwxr-xr-x  2 jkeenan  jkeenan        26 Sep  8 21:21 smoke
drwxr-xr-x  4 jkeenan  jkeenan         4 Sep  8 21:21 xt
-rw-r--r--  1 jkeenan  jkeenan     44090 Sep  9 17:05 Makefile
-rw-r--r--  1 jkeenan  jkeenan      1295 Sep  9 17:05 MYMETA.json
-rw-r--r--  1 jkeenan  jkeenan       774 Sep  9 17:05 MYMETA.yml
drwxr-xr-x  8 jkeenan  jkeenan         8 Sep  9 17:05 blib
-rw-r--r--  1 jkeenan  jkeenan         0 Sep  9 17:05 pm_to_blib
-rw-------  1 jkeenan  jkeenan  11337728 Sep  9 17:05 perl.core
drwxr-xr-x  9 jkeenan  jkeenan        56 Sep  9 17:05 t

Note that only on FreeBSD do I get this statement in the output from my branch:

XXX3: perl.core file exists at t/harness.t line 568.

... and that a core dump is created only on FreeBSD, as shown from this line in ls:

-rw-------  1 jkeenan  jkeenan  11337728 Sep  9 17:05 perl.core

Inferences and Next Questions

The test code which is generating the segfault for testing purposes appears to be this simple program, t/sample-tests/segfault:
```
$ cat t/sample-tests/segfault 
#!/usr/bin/perl
```

print "1..1\n"; print "ok 1\n"; kill 'SEGV', $$;


Why does this create a segfault file (`./perl.core`) on FreeBSD but not on Linux?

2. On FreeBSD, Test-Harness's own `make clean` (which appears to be a straightforward derivative from ExtUtils::MakeMaker) and `git clean -dfx` will both remove the core dump file, after which `git status` will be clean.  So what is annoying is that, on one OS but not the other, calling `git status` before running either of those cleanup commands will report an "untracked file."  That annoyance would affect anyone doing maintenance work on Test-Harness, but nobody else.  But this will be *really* annoying to anyone running Perl's own test suite on FreeBSD (or any other OS so affected).

3. Should we consider modifying `t/harness.t` so that if it generates a core dump, the test file tidies up after itself and removes `perl.core` once it's no longer needed?

Thank you very much.
Jim Keenan

Leont commented 1 year ago

Why does this create a segfault file (./perl.core) on FreeBSD but not on Linux?

Probably because your FreeBSD has been set up to leave behind coredumps but your linux hasn't?

jkeenan commented 1 year ago

Why does this create a segfault file (./perl.core) on FreeBSD but not on Linux?

Probably because your FreeBSD has been set up to leave behind coredumps but your linux hasn't?

Thanks for getting back on this. I'm merely a user of this FreeBSD system, so I myself would not have done anything to "set up" or not "set up" to leave core dumps behind. In addition, I noted in https://github.com/Perl/perl5/issues/21455 that I've also observed the untracked perl.core file in an OpenBSD VM that happens to sit on this FreeBSD server. That VM I simply downloaded from hashicorp, so I didn't consciously do anything to specify its core dump behavior there either.

So that suggests two questions:

On FreeBSD (and, presumably, similar OSes), how would I fiddle with the setting to leave core dumps behind?
Is there some setting on Linux (here Ubuntu 22.04 LTS) that by default does not leave core dumps behind? (The evidence from running t/harness.t in my branch is that on Linux either no perl.core is created or, if one is, it is immediately whisked away.)

jkeenan commented 1 year ago

It appears that whether a perl.core core dump file is created (and, hence, reported on by git status) is indeed an OS-specific setting. From a Stack Exchange response:

The core dump is written in the current directory of the 
process at the time of the crash.

Of course core dumps need to be enabled, by default those 
are usually disabled. Check the output of ulimit -c, 
if that's 0 then no core file will be written. 
Run ulimit -c unlimited to enable core dumps; 
this is a per-process setting which is inherited by 
processes started by that process.

Linux: From man bash:

ulimit [-HS] [-bcdefiklmnpqrstuvxPRT [limit]]
    Provides control over the resources available 
    to the shell and to processes started
    by  it, on systems that allow such control.
    ...
    -c     The maximum size of core files created

FreeBSD: From man sh:

ulimit [-HSabcdfklmnopstuvw] [limit]
    Set or display resource limits (see getrlimit(2)).  
    If limit is specified, the named resource will 
    be set; otherwise the current
    resource value will be displayed.
...
    -c coredumpsize
    The maximal size of core dump files, in 
    512-byte blocks.  Setting coredumpsize to 0 
    prevents core dump files from being created.

ulimit Settings to which I have access:

# Ubuntu Linux 22.04 LTS
$ uname -mrs; ulimit -c
Linux 6.2.0-32-generic x86_64
0

# Debian Bullseye
$ uname -mrs; ulimit -c
Linux 5.10.0-18-amd64 x86_64
0

# FreeBSD-13
$ uname -mrs; ulimit -c
FreeBSD 13.2-RELEASE-p1 amd64
unlimited

# OpenBSD-6.9
$ uname -mrs; ulimit -c
OpenBSD 6.9 amd64
unlimited

I myself have never fiddled with these system settings, so I assume that they are defaults for their respective OSes.

So my hunch is that before running a test which executes a segfault, we should check the system's value for ulimit -c and, if that value is not 0, set it to 0 for that process.

(I currently don't know how to do that, so if anyone else can jump in here with a patch for t/harness.t, please do so.)

Leont commented 1 year ago

So my hunch is that before running a test which executes a segfault, we should check the system's value for ulimit -c and, if that value is not 0, set it to 0 for that process.

We can't. Or at least not without bringing in a XS module like BSD::Resource

jkeenan commented 1 year ago

Would anything like this code in Perl's dist/Storable/stacksize be helpful?

 41 # the ; here is to ensure system() passes this to the shell
 42 elsif (system("ulimit -c 0 ;") == 0) {
 43     # try to prevent core dumps
 44     $prefix = "ulimit -c 0 ; ";
 45 }

Leont commented 1 year ago

Would anything like this code in Perl's dist/Storable/stacksize be helpful?

No, not really. Not without some serious rearchitecting that wouldn't be proportional to the size of the issue.

jkeenan commented 1 year ago

Would anything like this code in Perl's dist/Storable/stacksize be helpful?

No, not really. Not without some serious rearchitecting that wouldn't be proportional to the size of the issue.

Well, my last thought on this is ... Could we simply skip the segfaut test if ulimit was not 0 on a given machine?

jkeenan commented 1 year ago

Would anything like this code in Perl's dist/Storable/stacksize be helpful?

No, not really. Not without some serious rearchitecting that wouldn't be proportional to the size of the issue.

Well, my last thought on this is ... Could we simply skip the segfaut test if ulimit was not 0 on a given machine?

Or, if nothing else will work, we should place a comment indicating that a perl.core file will be left over if your machine is not a self-flushing toilet. :-)

jkeenan commented 1 year ago

This was presumably fixed in the repository by https://github.com/Perl-Toolchain-Gang/Test-Harness/commit/161ff77b3ec184b57ddafb3d4d19111016b260c0. However, it will need to be incorporated into a new CPAN release as well.

Perl-Toolchain-Gang / Test-Harness

t/harness.t creates untracked core dump on FreeBSD et al. but not on Linux #121