cloudfoundry / capi-release

Bosh Release for Cloud Controller and friends
Apache License 2.0
24 stars 101 forks source link

Clear job locks on generic workers after grace period is exceeded #477

Closed johha closed 1 month ago

johha commented 1 month ago

Description

Introduces a configurable grace period after which the generic worker processes will be killed. This allows workers to finish their current job during e.g. an update without being killed after 15 seconds (bpm default). After the worker processes are stopped/killed pending locks will be cleared which allows other workers to pick up pending jobs. Before the locks would be only cleared after the job timeout has been exceeded (default 4 hours). This is based on the assumption that jobs processed on the generic workers are idempotent.

Manual Tests

Tested the following draining scenarios locally and on a bbl env:

Other

Links to any other associated PRs: