Open mgorven opened 10 months ago
@rafidka any insight to share?
@auvipy , I would need to do some debugging on this. I will try to allocate some time for it, but I cannot promise anything this week.
@mgorven , it would help if you have some easy step-by-step reproduction.
I don't have an easy repro of actually losing events, I'll work on that. What I've done is add debug prints in CurlClient._setup_request()
and CurlClient._process()
and I can see that a request is started but never finished during shutdown: https://gist.github.com/mgorven/f671b3b9384b2814d86f7e99451a2936
celery call
command ready to dispatch a taskCurlSetup._setup_request()
so you can see when a request is startedcelery worker
celery call
You'd expect the worker to immediately process the dispatched task, but instead it only does so after the visibility timeout expires.
Potentially fixed with https://github.com/celery/celery/pull/9213 (although it’s in Celery, not Kombu) @mgorven @rafidka
Potentially fixed with celery/celery#9213 (although it’s in Celery, not Kombu) @mgorven @rafidka
Celery v5.5.0b3 released.
When celery is stopped gracefully, the in-flight request to SQS is not completed. If SQS returns messages for this request it considers them delivered, but Celery doesn't do anything with them (even if
task_reject_on_worker_lost
is set). SQS will redeliver the message when the visibility timeout expires, but this isn't great for a graceful shutdown.SQS uses
CurlClient
to make requests, which usescelery.worker.loops.asynloop
to do IO. The event loop is the first thing to stop whenSIGTERM/SIGINT
is received, and so in-flight requests are simply dropped.