Test-More / Test2-Harness

Alternative to Test::Harness
Other
23 stars 26 forks source link

yath hangs at end of tests prior to summary. #253

Open toddr opened 2 years ago

toddr commented 2 years ago

I just upgraded from 1.000038 to 1.000124 and now I'm seeing an issue where the 8 minute run will hang right at the end after it has run all tests. When you look for sub processes, you can see

root     19666 93.2  0.3 208436 51352 pts/1    R+   21:30  10:52                  \_ yath-collector
root     19667  7.0  0.6 256580 99224 pts/1    S+   21:30   0:49                  \_ yath-auditor

yath-collector's CPU time is going up if you monitor it.

toddr commented 2 years ago

Bisects to:

5c7a06731390dd02e655f7c7c1bfd58ea5f4d8d2 is the first bad commit
commit 5c7a06731390dd02e655f7c7c1bfd58ea5f4d8d2
Author: Chad Granum <exodist7@gmail.com>
Date:   Mon Nov 15 12:41:25 2021 -0800

    First pass at a "try again" model for too many files errors

 Changes                               |  2 +
 lib/Test2/Harness/Collector.pm        |  6 ++-
 lib/Test2/Harness/Collector/JobDir.pm | 82 +++++++++++++++++++++++++++++------
 3 files changed, 76 insertions(+), 14 deletions(-)

https://github.com/Test-More/Test2-Harness/commit/5c7a06731390dd02e655f7c7c1bfd58ea5f4d8d2

toddr commented 2 years ago

Reverting this on master is complicated. I'm trying to figure out what that would look like so I can test if this would fix the problem.

toddr commented 2 years ago

I've tried to minimize the arguments that are causing this and it boils down to calling yath like this.

yath test  --durations build-tools/yath-durations.json  -j3 [list of 2 tests]

I have to call it with --durations and I have to call it with -j3 minimum to make it hang.

toddr commented 2 years ago

If I give both tests in question # HARNESS-NO-STREAM, the problem goes away.

toddr commented 2 years ago

The part of the commit that seems to have caused the problem is

diff --git a/lib/Test2/Harness/Collector.pm b/lib/Test2/Harness/Collector.pm
index eb411880..f7dec725 100644
--- a/lib/Test2/Harness/Collector.pm
+++ b/lib/Test2/Harness/Collector.pm
@@ -73,11 +73,7 @@ sub process {

             $count += $e_count;
             next if $e_count;
-            my $done = $jdir->done;
-            unless ($done) {
-                $count++;
-                next;
-            }
+            my $done = $jdir->done or next;

             delete $jobs->{$job_try};
             unless ($self->settings->debug->keep_dirs) {

Without a deeper understanding, I'm not sure about reverting this on master

toddr commented 2 years ago

Given collector seems to be the thing taking up CPU time, it this certainly seems to be the offender. However when reversing this patch on master, it does not fix the problem :\

exodist commented 2 years ago

Do you have a reproduction case that you can share? Or is it too complex?

As for that line, my guess is your bisect found a different commit that leads to tests not completing. I know there have a been a few commits in history that break it. Your patch probably does not work for master because you found an already fixed case.

toddr commented 2 years ago

As best we could tell, the issue in the offending tests is they were trying to open ports to find an open one. We stopped doing that.