Open WesleyAC opened 1 year ago
My current suspicion is that this happens when the broker redis server restarts (unclear why this happens, docker logs are unhelpful) and takes long enough to come back up that celery stops retrying. I've set broker_connection_max_retries
to 0
(infinite) on bookwyrm.social to see if this fixes the issue, although I still want to dig into why redis is restarting (it's the docker container that's being restarted, according to docker ps
)
broker_connection_max_retries
doesn't seem to fix this, but i don't understand why.
oh, and redis was getting killed by the OOM killer, it turns out, which makes sense.
Experimenting now with setting broker_connection_max_retries
much lower, under the theory that will cause celery to die and restart when this happens, as opposed to trying and failing to connect forever.
Interesting theory in https://github.com/celery/celery/issues/4556 that it may be due to firewall configuration.
Also, another reason redis dies sometimes is running out of disk space, mostly due to the bug with CSV exports (#2157).
Celery recently hung — the worker was alive, but no tasks were active, and it was looping the following error:
I have no idea what caused this.
./bw-dev restart_celery
was sufficient to fix it, but I'd like to debug the actual problem here. Potentially a problem with Redis or the connection pool running out of connection slots?