Closed MattWindsor91 closed 4 years ago
This is still an issue, despite a few attempts to try to address it. Some mostly circumstantial observations:
ulimit
of the machine we're testing on is 1024, so it seems like a slow trickle of file lossAfter running the tester for several weeks on end, there have been no file exhaustion crashes. I'm satisfied that this is no longer an issue, and that the fix was indeed related to buggy SSH code. Closing.
And possibly doesn't close them possibly, leading to runs constantly failing with
too many open files
.I've done a lot of rounds of trying to find file leaks, but none seem to be showing up, meaning this is possibly just a case of, well, opening too many files at once.
Some mitigations in the interim could include making sure we don't parallelise on large corpi directly (instead, use a worker pool); trying to push worker pools more pervasively (deeply nested parallelisations within parallelisations might be blowing up combinatorial); and fixing the harness overspecialisation issue (which I'll file an issue for next).