Introduces a configurable grace period after which the generic worker processes will be killed.
This allows workers to finish their current job during e.g. an update without being killed after 15 seconds (bpm default).
After the worker processes are stopped/killed pending locks will be cleared which allows other workers to pick up pending jobs. Before the locks would be only cleared after the job timeout has been exceeded (default 4 hours).
This is based on the assumption that jobs processed on the generic workers are idempotent.
Manual Tests
Tested the following draining scenarios locally and on a bbl env:
Draining script waits for configured worker_grace_period_seconds if jobs are still being processed
Draining script kills worker processes after the configured grace period is exceeded
Draining script sends SIGTERM to worker processes which exit normally if no jobs are being processed
clear_pending_locks rake task is called as part of the drain script and clears the locked_by and locked_at column for all pending jobs assigned to the current worker
-> Repeated the previous draining tests with number_of_worker_threads configured
No locked jobs are left when decreasing the number of worker processes
No locked jobs are left when switching from worker processes to worker threads (including combinations like 2 workers with 4 threads each)
No locked jobs are left when switching from worker threads to worker processes
Description
Introduces a configurable grace period after which the generic worker processes will be killed. This allows workers to finish their current job during e.g. an update without being killed after 15 seconds (bpm default). After the worker processes are stopped/killed pending locks will be cleared which allows other workers to pick up pending jobs. Before the locks would be only cleared after the job timeout has been exceeded (default 4 hours). This is based on the assumption that jobs processed on the generic workers are idempotent.
Manual Tests
Tested the following draining scenarios locally and on a bbl env:
worker_grace_period_seconds
if jobs are still being processedSIGTERM
to worker processes which exit normally if no jobs are being processedclear_pending_locks
rake task is called as part of the drain script and clears thelocked_by
andlocked_at
column for all pending jobs assigned to the current worker-> Repeated the previous draining tests with
number_of_worker_threads
configuredOther
Links to any other associated PRs:
[x] I have viewed signed and have submitted the Contributor License Agreement
[x] I have made this pull request to the
develop
branch[x] I have run CF Acceptance Tests on bbl