If the run:complete event error happens to throw, the worker will not close out a Run properly, ie will not free up capacity and claim another run.
This is causing the worker to "jam" silently.
I do not know WHY run:complete is timing out on us right now (see these logs). And we'll have to work that out. But at least the worker shouldn't die silently any more.
I've also added a little bit more logging around this stuff. Most of the time it's junk but it may help us diagnose when the worker isn't releasing threads from the pool.
Short Description
If the run:complete event error happens to throw, the worker will not close out a Run properly, ie will not free up capacity and claim another run.
This is causing the worker to "jam" silently.
I do not know WHY
run:complete
is timing out on us right now (see these logs). And we'll have to work that out. But at least the worker shouldn't die silently any more.I've also added a little bit more logging around this stuff. Most of the time it's junk but it may help us diagnose when the worker isn't releasing threads from the pool.
Related issue
None raised, see slack.