High latency in redis backend when scaling up/down

Hi,

We have celery workers running on EC2 instances. Those instances scale up/down based on CPU load. On normal load the max latency is <= 100ms. When we start putting more load and scaling kicks in then the max latency goes up to >= 1000ms. Our current guess is that more celery workers are added or removed somehow kombu has a hard time coordinating between them and delivering tasks.

$ celery -A celery-run.celery report

software -> celery:4.3.0 (rhubarb) kombu:4.5.0 py:3.5.2
            billiard:3.6.0.0 redis:3.2.1
platform -> system:Linux arch:64bit, ELF
            kernel version:4.4.0-1077-aws imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:disabled

PERMANENT_SESSION_LIFETIME: datetime.timedelta(31)
SESSION_COOKIE_HTTPONLY: True
SESSION_COOKIE_PATH: None
SESSION_REFRESH_EACH_REQUEST: True
TESTING: False
JSON_SORT_KEYS: 'XXXX'
LOGGER_HANDLER_POLICY: 'always'
PROPAGATE_EXCEPTIONS: None
PREFERRED_URL_SCHEME: 'http'
celery_task_protocol: 1
task_ignore_result: True
task_serializer: 'json'
APPLICATION_ROOT: None
USE_X_SENDFILE: False
JSON_AS_ASCII: True
broker_url: 'redis://redis.internal:6379/13'
SESSION_COOKIE_NAME: 'session'
task_protocol: 1
task_queues:
    (<unbound Queue online_queue -> <unbound Exchange online_exchange(direct)> -> interval.tasks.online>,
 <unbound Queue map_queue -> <unbound Exchange map_exchange(direct)> -> interval.tasks.map>,
 <unbound Queue default_queue -> <unbound Exchange default_exchange(direct)> -> normal.tasks.default>,
 <unbound Queue log_queue -> <unbound Exchange log_exchange(topic)> -> normal.misc.log>)
SESSION_COOKIE_SECURE: False
task_soft_time_limit: 60
MAX_CONTENT_LENGTH: None
SESSION_COOKIE_DOMAIN: None
PRESERVE_CONTEXT_ON_EXCEPTION: None
EXPLAIN_TEMPLATE_LOADING: False
imports: ['app.utils.mqtt']
LOGGER_NAME: 'app'
DEBUG: False
JSONIFY_PRETTYPRINT_REGULAR: True
TRAP_HTTP_EXCEPTIONS: False
TRAP_BAD_REQUEST_ERRORS: False
SERVER_NAME: None
JSONIFY_MIMETYPE: 'application/json'
SECRET_KEY: 'XXXX'
TEMPLATES_AUTO_RELOAD: None
task_acks_late: True
SEND_FILE_MAX_AGE_DEFAULT: datetime.timedelta(0, 43200)
accept_content: ['json']

We are running our celery workers in supervisor:

[program:service-maps]
command=/home/ubuntu/.virtualenvs/service/bin/celery -A celery-run.celery worker -c 16   --without-gossip --without-mingle  --prefetch-multiplier=12 --queues=default_queue,map_queue -n worker_map%(process_num)02d-%%i@%%n --loglevel=info
process_name=%(program_name)s_%(process_num)02d
numprocs=4
user=ubuntu
directory=/home/ubuntu/service
autostart=true
autorestart=unexpected
redirect_stderr=true
stdout_logfile=/logs/service/celery.log

[program:service-online]
command=/home/ubuntu/.virtualenvs/service/bin/celery -A celery-run.celery worker -c 16   --without-gossip --without-mingle --prefetch-multiplier=12 --queues=default_queue,online_queue -n worker_online%(process_num)02d-%%i@%%n --loglevel=info
process_name=%(program_name)s_%(process_num)02d
numprocs=4
user=ubuntu
directory=/home/ubuntu/service
autostart=true
autorestart=unexpected
redirect_stderr=true
stdout_logfile=/logs/service/celery.log

Howdy @ermalguni -- I see your bug report is for celery 4.3.0 can you try the latest release candidate for celery: pip install celery==4.4.0rc3 and pin your kombu version to kombu==4.6.3

Use 4.6.3 for now because 4.6.4 introduced a redis bug I a trying to sort out over here: https://github.com/celery/kombu/pull/1089

If you can also consider if you have picked an appropriate value for prefetch-multiplier, the current docs says this: If you have many tasks with a long duration you want the multiplier value to be one: meaning it’ll only reserve one task per worker process at a time.

However – If you have many short-running tasks, and throughput/round trip latency is important to you, this number should be large. The worker is able to process more tasks per second if the messages have already been prefetched, and is available in memory. You may have to experiment to find the best value that works for you. Values like 50 or 150 might make sense in these circumstances. Say 64, or 128. https://docs.celeryproject.org/en/latest/userguide/optimizing.html#prefetch-limits

If you can run some more tests on those package versions I would like to hear if this is still an issue or not.

celery / kombu

High latency in redis backend when scaling up/down #1047