OpenFn / kit

The bits & pieces that make OpenFn work. (diagrammer, cli, compiler, runtime, runtime manager, logger, etc.)
8 stars 12 forks source link

Worker: ensure workers are released after an error on run:complete #685

Closed josephjclark closed 1 month ago

josephjclark commented 1 month ago

Short Description

If the run:complete event error happens to throw, the worker will not close out a Run properly, ie will not free up capacity and claim another run.

This is causing the worker to "jam" silently.

I do not know WHY run:complete is timing out on us right now (see these logs). And we'll have to work that out. But at least the worker shouldn't die silently any more.

I've also added a little bit more logging around this stuff. Most of the time it's junk but it may help us diagnose when the worker isn't releasing threads from the pool.

Related issue

None raised, see slack.

josephjclark commented 1 month ago

@taylordowns2000 This is ready to go and the image is built. Can we make sure we run the end to end tests on it before going live?