Open bioworkflows opened 1 year ago
We've experienced this issue a few times now and it does eventually eat up all of the available connections and take down the database. Any updates on a possible root cause?
Hey at @bioworkflows or @rjcampion3 any work around that y'all came up with? I just upgraded a project and am having connections get slurped up by this.
@curtisim0 We did the following settings in django:
CELERY_BROKER_TRANSPORT_OPTIONS = {
"socket_keepalive": True,
"socket_keepalive_options": {
socket.TCP_KEEPIDLE: 60,
socket.TCP_KEEPCNT: 5,
socket.TCP_KEEPINTVL: 10,
},
}
And we added --without-mingle to the celery command. That seemed to do the trick. I think the without-mingle was the main thing that seemed to resolve it, but the other settings are probably good to have anyway
Having the same issue (eventually takes down the database).
@rjcampion3 Can you explain, why without-mingle
would solve this issue? My understanding is that mingle is only relevant for the startup of the workers. It is therefore unclear to me how this is related to the issue.
@sehmaschine I don't know if our workaround really has much to do with the celery table locking up. We had an issue where celery stopped responding to tasks and had to be restarted and then pulled all the awaiting tasks from redis. This was a kombu issue plus updating the settings above.
We've run into the table corruption issue a couple of times and the only work around for fixing it was to create a new database. Not much of a work around for it, but a simple solution when needed.
@rjcampion3 Creating a new database sounds like a horror scenario. Our DB is > 100GB and that's not a straightforward process. Guess I'll check if we need django-celery-beat in the first place (I only used it, because of the ephemeral filesystem with digitalocean, but maybe there's a workaround for that, e.g. using spaces).
Summary:
The database backend of my website crashes today. Upon inspection, I found all slots are filled with database query, and ran more than 8 hrs.
My periodic database had hundreds of insertion/deletion operations today but it has less than 50 entries at peak, so I am wondering what is going on here. Is it some sort of deadlock that prevented these queries from completion?
Exact steps to reproduce the issue:
I am not sure how to reproduce this.
Detailed information