Test-More / Test2-Harness

Alternative to Test::Harness
Other
23 stars 26 forks source link

Avoid opening too many jobs in the collector. Only parse so many at o… #203

Closed toddr closed 2 years ago

toddr commented 3 years ago

…ne time.

Most unix systems limit the number of file handles any one process can have open at a time to 1024. When the collector gets behind, it may try to open too many files for completed jobs at once. Limit this with a constant. They'll be picked up once the ones polled are processed.

toddr commented 3 years ago

This corrects the problems identified on #201 with too many file handles being open at once by the collector.

exodist commented 3 years ago

I know ZipRecruiter used a -j56 when I was there on a super beefy machine. I do think we ran into this limit and simply bumped the max open filehandles limit.

ok, I am fine with a default, since it is a number of jobs we can stick with 200. But I want it configurable.

Also, I wonder if we can catch this exception and have it handle it gracefully... or if nothing else catch it, report that the user should probably lower this config options value, then die. That does not need to be part of this PR though if it is a lot of work.

toddr commented 3 years ago

Concerning the -j56, I think some recent jitter in releases has somehow slowed down the collector. I was even wondering if we were accidentally not using Cpanel::JSON::XS. What we do know is that updating to the latest T2 has caused this to happen where it was not previously.

toddr commented 3 years ago

We're totally ok with an option and documenting it. Updating the PR now.

exodist commented 3 years ago

ok, yeah, add the option and docs, I will merge this, and then without a rush we can look into speeding up the collector (or fixing whatever slowed it down)

toddr commented 3 years ago

@exodist: To add an option, am I going to have to create lib/App/Yath/Options/Collector.pm ?

exodist commented 3 years ago

Hmm, yes, that may be necessary, any command that starts a collector (test and run currently) would need to include those options, so yeah they should be their own category/module. But that's the kind of flexibility this system was made to allow, so should not be hard to do. If you get stuck let me know, but I think everything is likely to get passed to all the right places already, so should not be a difficult change. Let me know if you get stuck.

exodist commented 3 years ago

@toddr @atoomic I was going to explain how I needed things to change, but I had the time so I just added a commit instead. Now I need you guys to verify my changes still fix your problem. The changes you made would have fixed your problem, but the option was not actually usable and the collector would have been broken in subtle ways. If any of my changes need explanations please let me know.

If this does fix your issues I can merge and release.

toddr commented 3 years ago

I can't look till the morning but I am almost certain that 1000 will break it. processing each job requires multiple handles so 1000x2 is > 1024. It's really not important that the loop doesn't suck everything up each time since it'll get them on the next go around.

exodist commented 3 years ago

Yeah, please do wait until you have time to fully read it, the 1000 you commented on is a different setting that has no effect on number of open handles, it is just a second option I chose to expose and the value is unchanged. I kept the 300 default that you had in your PR commit.

toddr commented 3 years ago

Thanks for your help on this

toddr commented 3 years ago

I've pushed a commit trying to straighten out the messaging. I'm going to try to test it now.

toddr commented 3 years ago

We're testing this right now on our systems. Surprisingly even with -j$nproc, the collector runs behind pretty much the whole run until it gets to the end where single threaded tests run.

exodist commented 2 years ago

This has been merged