Closed btbonval closed 9 years ago
Staging and Production both use the same remote CloudAMQP URL. Production does not seem to have any issues. Staging does.
They run against a Tiger CloudAMQP instance.
Might have to do with the concurrent connections, by tweaking BROKER_POOL_LIMIT
, which is not set for either of Staging or Prod.
https://devcenter.heroku.com/articles/cloudamqp#celery
Tiger should support up to 12 concurrent connections, though. It seems unlikely prod is using all 12 connections and beta can't even get in there for a moment. https://www.cloudamqp.com/plans.html
I do not have account access to Finals Club's CloudAMQP to check any further.
Possibly a permissions problem. http://stackoverflow.com/a/26629093/1867779
Possibly a domain name change problem. In this case, if I understand the problem, CloudAMQP might have changed the connection information, but production has no issue because its environment hasn't been reset? I don't know why that would be the case. http://community.spiceworks.com/topic/683143-zenoss-4-2-5-jobs-never-move-beyond-pending
It looks like we've hit max connections. According to CloudAMQP:
Open connections 13 of 12
When you've reached the maximum concurrent connections further connections will be prohibited. You can connect again when you're under the limit.
Unfortunately when you've reached the maximum concurrent connections you can't access the management interface either.
So I cannot access the management console, but the problem seems quite clear.
I'll have to try to limit the number of connections from prod to 8 or 10.
CloudAMQP actually let me connect to the management interface anyway. This is the interface supplied: https://www.rabbitmq.com/management.html
13 connections, 14 channels, 0 exchanges and 0 queues. From the graphs, it looks like there have been no messages being passed for awhile.
I think that our plan used to allow more connections (like 15 or 18). I'm also thinking we've held 13 connections since before they changed it to 12.
It looks like you can click on a connection's name for more info and then close it in that interface. Going to close up some connections and see if more connections are created automatically by the production server.
Beta's queue suddenly showed up on the management console, so that seems to be capable of acting now. 9000 messages on the queue waiting to be processed (wow!).
I closed a few more connections so beta can get in there and parse that stuff more quickly.
This is the documentation for Celery 3.0.25. Looks like we've locked python-celery to 3.0.21, so that's close. http://docs.celeryproject.org/en/3.0/configuration.html
Looks like we're configuring Celery here: https://github.com/FinalsClub/karmaworld/blob/master/karmaworld/settings/common.py#L323-L331
and here with some values overridden: https://github.com/FinalsClub/karmaworld/blob/master/karmaworld/settings/prod.py#L58-L97
Strange that we had 13 connections and beta could not connect at all: "The pool is enabled by default since version 2.5, with a default limit of ten connections." Furthermore, we define BROKER_POOL_LIMIT
in the configuration to just 1 connection. No reason we should have hit 13 connections!
I notice mongo and redis celery backends have specific connection limits, but the amqp backend does not.
Someone else having this problem, here's an SO with no answer: http://stackoverflow.com/questions/23249850/celery-cloudamqp-creates-new-connection-for-each-task
Someone mentioned BROKER_POOL_LIMIT
not working for them with CloudAMQP, but setting it to 0 was alright. The question specifically asked about Redis, though.
http://stackoverflow.com/a/23563018/1867779
So it looks like we might want to set BROKER_POOL_LIMIT=0
and then connections will be made and lost as necessary. It seems like the utility of the connection is transient if watching the stats has been any indication.
I'm running celery. I see connections being created from beta's celery now. Some are refused due to the max limit being reached, but connections are definitely happening. I'm watching CloudAMQP and seeing between 11 and 13 connections (up from 9 left by production). I don't see the queue size decreasing much from 9000some.
I'm seeing literally thousands of weirdly named queues being created and I don't know why. 2500 and increasing uniquely named queues. Before, there were 4: 2 weird names and 2 human names.
Ah but the score board just updated, adding a new school. That would seem to indicate the scoreboard update celery task is running, for #377. It would also seem to indicate celery is working well enough to close this ticket.
This issue remains closed, but as I investigate some AMQP stuff, I'll add notes to this ticket for reference.
There are now only 12 queues running.
3 have names of the form UUID.celery.pidbox
. It looks like each queue with that name is a "fan exchange" created by each worker. or something. Rarely they switch from idle to running.
http://hustoknow.blogspot.com/2011/11/how-celeryctl-command-works-in-celery.html
2 are named karmanotes-[beta|prod]
as expected. Occasionally they switch from idle to running.
The others are named with long hexadecimal strings. These appear to always be idle.
Staging system is getting connection reset by peer on
tweet_note
andupdate-scoreboard
.I thought twitter was causing problems, but
update-scoreboard
does not reach out to any external servers besides the database.For
update-scoreboard
andtweet_note
, the problem occurs in Kombu/connection which occurs in AMQP/connection. This would seem to indicate that the AMQP cloud connection isn't working.