FWR implements a hard shutdown timeout of 25 seconds, exactly like Sidekiq, but FWG never implemented it because contexts were not part of the original API design. A year or two later we integrated Contexts but only to provide access to helper Values.
Today FWG will pause forever, waiting for all jobs to stop. Most jobs execute quickly and finish within 30 seconds but longer jobs can delay shutdown, leading to platforms like Heroku KILLing FWG without warning, orphaning those long jobs for the reservation timeout (default: 30 minutes).
The proper and canonical shutdown process should:
receive an external signal
flag that shutdown is starting
internal goroutines should exit any process loop/select
run any registered Shutdown callbacks
pause up to ShutdownTimeout for existing jobs to finish
cancel job context so lingering jobs quickly die with an error
FAIL any jobs in (5) so they can be re-executed soon
Close connection pool and exit.
Ideally this will guarantee that FWG can gracefully stop the vast majority of jobs and FAIL any which linger past 25 seconds so they can be re-executed quickly after process restart.
Please note: for FWG to FAIL a job, it must return an error. If your job responds to a cancelled Context by simply returning, that is considered a successful job execution.
FWR implements a hard shutdown timeout of 25 seconds, exactly like Sidekiq, but FWG never implemented it because contexts were not part of the original API design. A year or two later we integrated Contexts but only to provide access to helper Values.
Today FWG will pause forever, waiting for all jobs to stop. Most jobs execute quickly and finish within 30 seconds but longer jobs can delay shutdown, leading to platforms like Heroku KILLing FWG without warning, orphaning those long jobs for the reservation timeout (default: 30 minutes).
The proper and canonical shutdown process should:
FAIL
any jobs in (5) so they can be re-executed soonIdeally this will guarantee that FWG can gracefully stop the vast majority of jobs and
FAIL
any which linger past 25 seconds so they can be re-executed quickly after process restart.This is the fix for contribsys/faktory#468