alexpearce / home

a blag.
https://alexpearce.me
MIT License
23 stars 2 forks source link

High-availability real-time Celery monitoring #58

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

High-availability real-time Celery monitoring - Alex Pearce

A guide on creating multiple Celery event receivers for highly available real-time worker and task monitoring.

https://alexpearce.me/2022/07/high-availability-celery-monitoring/

behm commented 1 year ago

I have a celery applications that runs in a Docker container and a Celery worker that runs in a separate Docker container. The events receiver is written in a similar way to what you have above but while listening for events, I get a "socket.timeout: timed out" message.

Are there any ports I need to expose? Is there something I am missing?

alexpearce commented 1 year ago

This message is emitted from the app.events.Receiver receiver? Hmm. What broker are you using? Are the application and worker able to connect to the broker?

behm commented 1 year ago

Alex, I figured out my problem. Something dumb on my part in the way I was sharing code between the containers. I did have another question though.

We have seen cases where workers are killed unexpectedly (mostly out-of-memory) and we lose a task. I am trying to add an "audit check" to our system that looks through our database of "jobs" to see if incomplete tasks in a job are accounted for in Celery.

Once you receive an event into the EventReceiver, does it get stored anywhere? I basically want to ask Celery if it knows anything about a specific task_id and if not, I would "re-queue" it. I have tried to use AsyncResult but I always get a status of Pending no matter what state the task is in.

Thanks, Brian