This PR fixes an issue where the worker doesn't close down pooled child processes after uncaught exceptions, resulting in the worker refusing to work.
Related issue
Fixes #664
Implementation Details
What's basically happening is:
The engine is correctly catching uncaught exceptions (and I think process.exits) when they occur
And it is correctly sending error events out (these errors may be rubbish but I do think they are sending)
BUT the engine doesn't properly resolve the promise for that run, so the engine still thinks the workflow is executing
That means the child process pool doesn't allocate new resources for future work, and so chokes up
The engine should eventually be timing out the runs, but by this point lightning thinks they're dead and isn't listening to events. But I think the backlog will very slowly clear.
Anyway, as result of the fix, the error is handled gracefully, the pool re-allocates the worker thread, and everyone is happy.
QA Notes
I've added two integration tests, both of which reproduce very similar errors to main. And they both fail on main.
Checklist before requesting a review
[x] I have performed a self-review of my code
[x] I have added unit tests
[x] Changesets have been added (if there are production code changes)
Short Description
This PR fixes an issue where the worker doesn't close down pooled child processes after uncaught exceptions, resulting in the worker refusing to work.
Related issue
Fixes #664
Implementation Details
What's basically happening is:
The engine should eventually be timing out the runs, but by this point lightning thinks they're dead and isn't listening to events. But I think the backlog will very slowly clear.
Anyway, as result of the fix, the error is handled gracefully, the pool re-allocates the worker thread, and everyone is happy.
QA Notes
I've added two integration tests, both of which reproduce very similar errors to main. And they both fail on main.
Checklist before requesting a review