Test-More / Test2-Harness

Alternative to Test::Harness
Other
23 stars 26 forks source link

yath-runner retry.t did not respond to SIGTERM, sending SIGKILL... #207

Open rkleemann opened 3 years ago

rkleemann commented 3 years ago

When trying to install Test2::Harness in a docker container, I get an inconsistent failure. Sometimes it completes successfully (in about 8 minutes), and sometimes it fails after a half-hour of trying. I recently tried running the install as cpanm --verbose Test2::Harness, and it is appearing to get stuck in a loop, repeatedly saying for the past hour-plus:

(INTERNAL) 2837 yath-runner /root/.cpanm/work/1607030892.1/Test2-Harness-1.000042/t/integration/retry.t did not respond to SIGTERM, sending SIGKILL to 3096...

I haven't tried with a minimal Dockerfile, but I'm using centos:latest as the base, installing gcc, git, and perl-App-cpanminus via yum, and then installing a bunch of modules via cpanm, of which one is Test2::Harness.

For the record, the test before this, failure_cases.t, completes successfully.

rkleemann commented 3 years ago

In an attempt at playing with some variables, I changed the FROM line in the Dockerfile from centos:latest to perl, and Test2::Harness 1.000042 completed its tests and installed. Given that this is a perfectly reasonable workaround for the issue, I think the bug can be closed, or it can be left open in order to investigate the issue further.

exodist commented 3 years ago

I will leave the bug open for a while to see if anyone else has issues. If nothing else gets reported I may just close it. I am refactoring the code that would be responsible for killing stalled processes, so this will probably be fixed by that anyway.

charsbar commented 1 year ago

I (and most probably rjbs) encountered this issue while testing PAUSE Web with yath-runner under GitHub Action.

See also https://github.com/andk/pause/pull/426/files#diff-190e6442506b7204e263090f96dce1bd37272aa75d50dad4eeda4f5eca86eaa9R18-R22

Excerpt from a log ( https://github.com/charsbar/pause/actions/runs/4894552162/jobs/8738930578 )

( TIMEOUT)  job 12    Sometimes tests will fork and then return. On supported systems Test2::Harness
( TIMEOUT)  job 12    will start all tests with their own process group and will wait for the entire
( TIMEOUT)  job 12    group to exit before considering the test done. In these cases Test2::Harness
( TIMEOUT)  job 12    will poll for output from the process group at a configurable interval, if no
( TIMEOUT)  job 12    output is produced between intervals the process group will be forcefully
( TIMEOUT)  job 12    killed. See the '--post-exit-timeout' option to configure the interval.
< TIMEOUT>  job 12    A timeout (post-exit) has occured (after ?? seconds), job was forcefully killed
(INTERNAL)     14083 yath-runner /__w/pause/pause/t/pause_2017/action/add_user.t did not respond to SIGTERM, sending SIGKILL to 15093...

SIGTERM comes from Test::mysqld ( https://metacpan.org/dist/Test-mysqld/source/lib/Test/mysqld.pm#L150-166 )

Hope this helps a bit.