Open toddr opened 2 years ago
Bisects to:
5c7a06731390dd02e655f7c7c1bfd58ea5f4d8d2 is the first bad commit
commit 5c7a06731390dd02e655f7c7c1bfd58ea5f4d8d2
Author: Chad Granum <exodist7@gmail.com>
Date: Mon Nov 15 12:41:25 2021 -0800
First pass at a "try again" model for too many files errors
Changes | 2 +
lib/Test2/Harness/Collector.pm | 6 ++-
lib/Test2/Harness/Collector/JobDir.pm | 82 +++++++++++++++++++++++++++++------
3 files changed, 76 insertions(+), 14 deletions(-)
https://github.com/Test-More/Test2-Harness/commit/5c7a06731390dd02e655f7c7c1bfd58ea5f4d8d2
Reverting this on master is complicated. I'm trying to figure out what that would look like so I can test if this would fix the problem.
I've tried to minimize the arguments that are causing this and it boils down to calling yath like this.
yath test --durations build-tools/yath-durations.json -j3 [list of 2 tests]
I have to call it with --durations
and I have to call it with -j3
minimum to make it hang.
If I give both tests in question # HARNESS-NO-STREAM, the problem goes away.
The part of the commit that seems to have caused the problem is
diff --git a/lib/Test2/Harness/Collector.pm b/lib/Test2/Harness/Collector.pm
index eb411880..f7dec725 100644
--- a/lib/Test2/Harness/Collector.pm
+++ b/lib/Test2/Harness/Collector.pm
@@ -73,11 +73,7 @@ sub process {
$count += $e_count;
next if $e_count;
- my $done = $jdir->done;
- unless ($done) {
- $count++;
- next;
- }
+ my $done = $jdir->done or next;
delete $jobs->{$job_try};
unless ($self->settings->debug->keep_dirs) {
Without a deeper understanding, I'm not sure about reverting this on master
Given collector seems to be the thing taking up CPU time, it this certainly seems to be the offender. However when reversing this patch on master, it does not fix the problem :\
Do you have a reproduction case that you can share? Or is it too complex?
As for that line, my guess is your bisect found a different commit that leads to tests not completing. I know there have a been a few commits in history that break it. Your patch probably does not work for master because you found an already fixed case.
As best we could tell, the issue in the offending tests is they were trying to open ports to find an open one. We stopped doing that.
I just upgraded from 1.000038 to 1.000124 and now I'm seeing an issue where the 8 minute run will hang right at the end after it has run all tests. When you look for sub processes, you can see
yath-collector
's CPU time is going up if you monitor it.