Closed ShaheedHaque closed 3 years ago
A patch such as this is sufficient to contain the problem:
$ diff /usr/local/lib/python3.7/dist-packages/celery/backends/consul.py hacked_consul.py
90,93c90,94
< session_id = self.client.session.create(name=session_name,
< behavior='delete',
< ttl=self.expires)
< logger.debug('Created Consul session %s', session_id)
---
> try:
> session_id = self.client.session.create(name=session_name,
> behavior='delete',
> ttl=self.expires)
> logger.debug('Created Consul session %s', session_id)
95,98c96,101
< logger.debug('Writing key %s to Consul', key)
< return self.client.kv.put(key=key,
< value=value,
< acquire=session_id)
---
> logger.debug('Writing key %s to Consul', key)
> return self.client.kv.put(key=key,
> value=value,
> acquire=session_id)
> except TypeError as e:
> logger.exception('cannot save result for {}={}: {}'.format(session_name, value, e))
Of course, this is a terrible Consul-specific hack and not a general fix for all backends.
thanks for the detailed report
I had the same issue (on stable v4.3.0) when using the django-db and redis backends, and was able to resolve it by upgrading to latest master (or 4.4.0rc4).
Here is a simple the reproducible example (on older version): https://gist.github.com/asfaltboy/81dfde85551b5a9029f8d1b962e5422d
The key settings that cause this issue are "task_acks_late=true" and "worker_prefetch_multiplier=1". With a different prefetch, the worker consumes that number of messages (e.g 4 for the default value of 4), and after all of these fail to be stored, reaches the same "starved" state.
Reproduction steps are as follows:
celery -A my_app -c 1 -l debug worker
celery -A my_app call my_app.fail --args='[5]'
brew services stop redis
)[2019-11-28 12:17:24,453: ERROR/MainProcess] Pool callback raised exception: ConnectionError('Error 61 connecting to localhost:6379. Connection refused.')
brew services start redis
)celery -A my_app call my_app.fail --args='[0.1]'
Note: this may be a bug in billiard, I don't know enough about the internals, but I'll try to bisect and update here as I travel back in time:
[2019-11-28 13:38:57,152: CRITICAL/MainProcess] Task my_app.fail[07905d9f-99a6-41f0-87c5-14f676e8de28] INTERNAL ERROR: ConnectionError('Error 61 connecting to localhost:6379. Connection refused.',)
@ShaheedHaque can you please try installing version 4.4.0rc4 ?
@auvipy - I think we can remove the 4.5 milestone, if Shaheed confirms it's fixed in 4.4
A repro would be tricky here since my pursuit of the various failure modes I had thought might be involved (in Consul itself, or in the requests/urllib library used to talk to it) all ended inconclusively. I do have a PR queued up to make python-consul threadsafe by replacing requests/urllib with urllib3, but that has not been merged either. See https://github.com/cablehead/python-consul/pull/258.
I can only confirm that on 4.3.0, I am no longer seeing the underlying failure (i.e outside Celery) that was the original trigger of this issue for me, so have no way to tell if 4.4.0rc4 fixes it.
I suspect this has much in common with #4363. I'm NOT marking this as a duplicate however because Consul is a SyncBackend whereas #4363 relates to Redis which I believe is an AsyncBackend. I'll leave it to the devs to consider if this is, in fact, a dupe.
I would only note that any exception that occurs while saving a task result, in 4.3 and prior, always causes "worker starvation" state (where the worker will not consume further tasks), at least with task_acks_late
enabled.
But, I've shown this to be true with multiple backends, and that was fixed in 4.4. I am aware that various backends may still have various issues, given the right circumstances but at least these won't "starve the worker forever". We could probably create an integration test case in order to ensure we don't regress to this issue again (though I don't know which change in 4.4 fixed this, or might even be in billiard!)
celery==4.4.0rc5 is on pypi
@asfaltboy thanks for the helpful summary
Just a note that the recent updates to #5605 mean that the root cause of the issue that triggers this for me is known.
Just a note that the recent updates to #5605 mean that the root cause of the issue that triggers this for me is known.
we moved to python-consul2 in master can you check that fixed it for you? https://github.com/celery/celery/commit/ae463025c12d78c2b96a885aa4385ff33811c17a
The move to python-consul2 does not actually fix anything; I believe the rationale for moving was simply that it seemed to be alive, whereas python-consul upstream seems inactive.
That said, I run with the ugly/hacky workaround I documented in #5605, so I don't see that problem or this problem any longer.
backends were made thread-safe in a recent PR. that didn't help too? should we document the workaround you provide?
I cannot find the PR you refer to right now, but I do remember looking at it, and IIRC, it does not address the problem of #5605. Actually, from my point of view, rather than document the workaround, I wonder if - in the absence of a better solution - my workaround should be committed on the grounds that correctness trumps any performance concerns?
I retested against the current release and have not been able to see this issue. Closing. #5605 is still open of course.
Checklist
Mandatory Debugging Information
celery -A proj report
in the issue. (if you are not able to do this, then at least specify the Celery version affected).master
branch of Celery.pip freeze
in the issue.Optional Debugging Information
Related Issues and Possible Duplicates
Related Issues
5605 describes the issue as seen when using the Consul backend, though I believe the issue described here applies to all backends.
Possible Duplicates
Environment & Settings
Celery version:
celery report
Output:``` $ celery -A paiyroll report software -> celery:4.3.0 (rhubarb) kombu:4.5.0 py:3.7.3 billiard:3.6.0.0 py-amqp:2.4.2 platform -> system:Linux arch:64bit, ELF kernel version:5.0.0-20-generic imp:CPython loader -> celery.loaders.app.AppLoader settings -> transport:amqp results:consul://localhost:8500/ ABSOLUTE_URL_OVERRIDES: { } ADMINS: [] ALLOWED_HOSTS: ['*'] APPEND_SLASH: True AUTHENTICATION_BACKENDS: ['django.contrib.auth.backends.ModelBackend'] AUTH_PASSWORD_VALIDATORS: '********' AUTH_USER_MODEL: 'paiyroll.User' BASE_DIR: '/main/srhaque/Innovatieltd/source' CACHES: { 'default': {'BACKEND': 'django.core.cache.backends.locmem.LocMemCache'}} CACHE_MIDDLEWARE_ALIAS: 'default' CACHE_MIDDLEWARE_KEY_PREFIX: '********' CACHE_MIDDLEWARE_SECONDS: 600 CELERY_BROKER_HEARTBEAT: None CELERY_BROKER_URL: 'amqp://guest:********@localhost:5672//' CELERY_RESULT_BACKEND: 'consul://localhost:8500/' CSRF_COOKIE_AGE: 31449600 CSRF_COOKIE_DOMAIN: None CSRF_COOKIE_HTTPONLY: False CSRF_COOKIE_NAME: 'csrftoken' CSRF_COOKIE_PATH: '/' CSRF_COOKIE_SAMESITE: 'Lax' CSRF_COOKIE_SECURE: False CSRF_FAILURE_VIEW: 'django.views.csrf.csrf_failure' CSRF_HEADER_NAME: 'HTTP_X_CSRFTOKEN' CSRF_TRUSTED_ORIGINS: [] CSRF_USE_SESSIONS: False DATABASES: { 'default': { 'ATOMIC_REQUESTS': False, 'AUTOCOMMIT': True, 'CONN_MAX_AGE': 0, 'ENGINE': 'django.db.backends.postgresql', 'HOST': 'localhost', 'NAME': 'foo', 'OPTIONS': {}, 'PASSWORD': '********', 'PORT': '5432', 'TEST': { 'CHARSET': None, 'COLLATION': None, 'MIRROR': None, 'NAME': None}, 'TIME_ZONE': None, 'USER': 'dbcoreuser'}, 'fdw': { 'ATOMIC_REQUESTS': False, 'AUTOCOMMIT': True, 'CONN_MAX_AGE': 0, 'ENGINE': 'django.db.backends.postgresql', 'HOST': 'localhost', 'NAME': 'foo', 'OPTIONS': {}, 'PASSWORD': '********', 'PORT': '5432', 'TEST': { 'CHARSET': None, 'COLLATION': None, 'MIRROR': None, 'NAME': None}, 'TIME_ZONE': None, 'USER': 'dbcoreuser'}} DATABASE_ROUTERS: '********' DATA_UPLOAD_MAX_MEMORY_SIZE: 2621440 DATA_UPLOAD_MAX_NUMBER_FIELDS: 1000 DATETIME_FORMAT: 'N j, Y, P' DATETIME_INPUT_FORMATS: ['%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M:%S.%f', '%Y-%m-%d %H:%M', '%Y-%m-%d', '%m/%d/%Y %H:%M:%S', '%m/%d/%Y %H:%M:%S.%f', '%m/%d/%Y %H:%M', '%m/%d/%Y', '%m/%d/%y %H:%M:%S', '%m/%d/%y %H:%M:%S.%f', '%m/%d/%y %H:%M', '%m/%d/%y'] DATE_FORMAT: 'N j, Y' DATE_INPUT_FORMATS: ['%Y-%m-%d', '%m/%d/%Y', '%m/%d/%y', '%b %d %Y', '%b %d, %Y', '%d %b %Y', '%d %b, %Y', '%B %d %Y', '%B %d, %Y', '%d %B %Y', '%d %B, %Y'] DEBUG: True DEBUG_PROPAGATE_EXCEPTIONS: False DECIMAL_SEPARATOR: '.' DEFAULT_CHARSET: 'utf-8' DEFAULT_CONTENT_TYPE: 'text/html' DEFAULT_DB: 'default' DEFAULT_EXCEPTION_REPORTER_FILTER: 'django.views.debug.SafeExceptionReporterFilter' DEFAULT_FILE_STORAGE: 'django.core.files.storage.FileSystemStorage' DEFAULT_FROM_EMAIL: 'webmaster@localhost' DEFAULT_INDEX_TABLESPACE: '' DEFAULT_TABLESPACE: '' DISALLOWED_USER_AGENTS: [] EMAIL_BACKEND: 'django.core.mail.backends.filebased.EmailBackend' EMAIL_FILE_PATH: '/tmp/email_messages' EMAIL_HOST: 'smtp.gmail.com' EMAIL_HOST_PASSWORD: '********' EMAIL_HOST_USER: 'paiyroll.com@gmail.com' EMAIL_PORT: 587 EMAIL_SSL_CERTFILE: '' EMAIL_SSL_KEYFILE: '********' EMAIL_SUBJECT_PREFIX: '[Django] ' EMAIL_TIMEOUT: None EMAIL_USE_LOCALTIME: False EMAIL_USE_SSL: False EMAIL_USE_TLS: True FDW_DB: 'fdw' FILE_CHARSET: 'utf-8' FILE_UPLOAD_DIRECTORY_PERMISSIONS: None FILE_UPLOAD_HANDLERS: ['django.core.files.uploadhandler.MemoryFileUploadHandler', 'django.core.files.uploadhandler.TemporaryFileUploadHandler'] FILE_UPLOAD_MAX_MEMORY_SIZE: 2621440 FILE_UPLOAD_PERMISSIONS: None FILE_UPLOAD_TEMP_DIR: None FIRST_DAY_OF_WEEK: 0 FIXTURE_DIRS: [] FORCE_SCRIPT_NAME: None FORMAT_MODULE_PATH: None FORM_RENDERER: 'django.forms.renderers.DjangoTemplates' IGNORABLE_404_URLS: [] INSTALLED_APPS: ['paiyroll.apps.PaiyrollConfig', 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'django.contrib.postgres', 'django_jinja', 'bootstrapform_jinja', 'import_export', 'phonenumber_field', 'polymorphic', 'viewflow'] INTERNAL_IPS: [] LANGUAGES: [('af', 'Afrikaans'), ('ar', 'Arabic'), ('ast', 'Asturian'), ('az', 'Azerbaijani'), ('bg', 'Bulgarian'), ('be', 'Belarusian'), ('bn', 'Bengali'), ('br', 'Breton'), ('bs', 'Bosnian'), ('ca', 'Catalan'), ('cs', 'Czech'), ('cy', 'Welsh'), ('da', 'Danish'), ('de', 'German'), ('dsb', 'Lower Sorbian'), ('el', 'Greek'), ('en', 'English'), ('en-au', 'Australian English'), ('en-gb', 'British English'), ('eo', 'Esperanto'), ('es', 'Spanish'), ('es-ar', 'Argentinian Spanish'), ('es-co', 'Colombian Spanish'), ('es-mx', 'Mexican Spanish'), ('es-ni', 'Nicaraguan Spanish'), ('es-ve', 'Venezuelan Spanish'), ('et', 'Estonian'), ('eu', 'Basque'), ('fa', 'Persian'), ('fi', 'Finnish'), ('fr', 'French'), ('fy', 'Frisian'), ('ga', 'Irish'), ('gd', 'Scottish Gaelic'), ('gl', 'Galician'), ('he', 'Hebrew'), ('hi', 'Hindi'), ('hr', 'Croatian'), ('hsb', 'Upper Sorbian'), ('hu', 'Hungarian'), ('hy', 'Armenian'), ('ia', 'Interlingua'), ('id', 'Indonesian'), ('io', 'Ido'), ('is', 'Icelandic'), ('it', 'Italian'), ('ja', 'Japanese'), ('ka', 'Georgian'), ('kab', 'Kabyle'), ('kk', 'Kazakh'), ('km', 'Khmer'), ('kn', 'Kannada'), ('ko', 'Korean'), ('lb', 'Luxembourgish'), ('lt', 'Lithuanian'), ('lv', 'Latvian'), ('mk', 'Macedonian'), ('ml', 'Malayalam'), ('mn', 'Mongolian'), ('mr', 'Marathi'), ('my', 'Burmese'), ('nb', 'Norwegian Bokmål'), ('ne', 'Nepali'), ('nl', 'Dutch'), ('nn', 'Norwegian Nynorsk'), ('os', 'Ossetic'), ('pa', 'Punjabi'), ('pl', 'Polish'), ('pt', 'Portuguese'), ('pt-br', 'Brazilian Portuguese'), ('ro', 'Romanian'), ('ru', 'Russian'), ('sk', 'Slovak'), ('sl', 'Slovenian'), ('sq', 'Albanian'), ('sr', 'Serbian'), ('sr-latn', 'Serbian Latin'), ('sv', 'Swedish'), ('sw', 'Swahili'), ('ta', 'Tamil'), ('te', 'Telugu'), ('th', 'Thai'), ('tr', 'Turkish'), ('tt', 'Tatar'), ('udm', 'Udmurt'), ('uk', 'Ukrainian'), ('ur', 'Urdu'), ('vi', 'Vietnamese'), ('zh-hans', 'Simplified Chinese'), ('zh-hant', 'Traditional Chinese')] LANGUAGES_BIDI: ['he', 'ar', 'fa', 'ur'] LANGUAGE_CODE: 'en-us' LANGUAGE_COOKIE_AGE: None LANGUAGE_COOKIE_DOMAIN: None LANGUAGE_COOKIE_NAME: 'django_language' LANGUAGE_COOKIE_PATH: '/' LOCALE_PATHS: [] LOGGING: { 'disable_existing_loggers': False, 'formatters': { 'standard': { 'format': '%(asctime)s [%(levelname)s] ' '%(name)s: %(message)s'}}, 'handlers': { 'file': { 'backupCount': 10, 'class': 'logging.handlers.RotatingFileHandler', 'filename': '/main/srhaque/.local/share/paiyroll/django.log', 'formatter': 'standard', 'level': 'DEBUG', 'maxBytes': 15728640}}, 'loggers': {'': {'handlers': ['file'], 'level': 'INFO', 'propagate': True}}, 'version': 1} LOGGING_CONFIG: 'logging.config.dictConfig' LOGGING_DIR: '/main/srhaque/.local/share/paiyroll' LOGIN_REDIRECT_URL: '/accounts/profile/' LOGIN_URL: '/accounts/login/' LOGOUT_REDIRECT_URL: None MANAGERS: [] MEDIA_ROOT: '' MEDIA_URL: '' MESSAGE_STORAGE: 'django.contrib.messages.storage.fallback.FallbackStorage' MIDDLEWARE: ['django.middleware.security.SecurityMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'project.middleware.SessionExpiryMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware'] MIGRATION_MODULES: { } MONTH_DAY_FORMAT: 'F j' NUMBER_GROUPING: 0 PAIYROLL_CLIENT_OUTPUT_DIR: '/tmp' PAIYROLL_DEBUG_ADMIN_ACCESS_ALL_WORKFLOWS: True PAIYROLL_DEBUG_NOTIFICATIONS: True PAIYROLL_DEBUG_PAY_WALK: 0 PAIYROLL_DEBUG_PROMPTS: 259 PAIYROLL_DEBUG_SCHEDULING_DATE: '2018-01-01' PAIYROLL_DEBUG_SCHEDULING_FAST: 10 PAIYROLL_GB_RTI_ALL_TRANSACT: None PAIYROLL_GB_RTI_OTHERS_TRANSACT_FROM: '2019-04-06' PAIYROLL_GB_RTI_TEST_COMPANIES: ('HMRC', 'HMRC RTI Recognition') PAIYROLL_GB_RTI_TEST_COMPANIES_ACTUAL_T: '2019-03-31' PAIYROLL_INTERNAL_NETWORK: IPv4Network('192.168.1.0/24') PAIYROLL_NOTIFICATIONS: { 'email': { 'backend': 'django.core.mail.backends.filebased.EmailBackend', 'file_path': '/tmp/email_messages', 'host': 'smtp.gmail.com', 'password': '********', 'port': 587, 'ssl_certfile': '', 'ssl_keyfile': '********', 'timeout': None, 'use_ssl': False, 'use_tls': True, 'username': 'paiyroll.com@gmail.com'}, 'sms': { 'account_sid': 'ACbabf740b7a3d23010cbe381a8a184fe0', 'from': '+15005550006', 'password': '********'}} PAIYROLL_REPORT_SPEC_DIR: '/main/srhaque/Innovatieltd/source/paiyroll/report/JasperReports' PAIYROLL_REPORT_SPEC_SPREADSHEET: '/main/srhaque/Innovatieltd/source/paiyroll/report/Spreadsheets' PAIYROLL_SITE_NAME: 'login.paiyroll.com' PASSWORD_HASHERS: '********' PASSWORD_RESET_TIMEOUT_DAYS: '********' PREPEND_WWW: False ROOT_URLCONF: 'project.urls' SECRET_KEY: '********' SECURE_BROWSER_XSS_FILTER: False SECURE_CONTENT_TYPE_NOSNIFF: False SECURE_HSTS_INCLUDE_SUBDOMAINS: False SECURE_HSTS_PRELOAD: False SECURE_HSTS_SECONDS: 0 SECURE_PROXY_SSL_HEADER: None SECURE_REDIRECT_EXEMPT: [] SECURE_SSL_HOST: None SECURE_SSL_REDIRECT: False SERVER_EMAIL: 'root@localhost' SESSION_CACHE_ALIAS: 'default' SESSION_COOKIE_AGE: 5400 SESSION_COOKIE_DOMAIN: None SESSION_COOKIE_HTTPONLY: True SESSION_COOKIE_NAME: 'sessionid' SESSION_COOKIE_PATH: '/' SESSION_COOKIE_SAMESITE: 'Lax' SESSION_COOKIE_SECURE: False SESSION_ENGINE: 'django.contrib.sessions.backends.db' SESSION_EXPIRE_AT_BROWSER_CLOSE: False SESSION_FILE_PATH: None SESSION_SAVE_EVERY_REQUEST: False SESSION_SERIALIZER: 'django.contrib.sessions.serializers.JSONSerializer' SETTINGS_MODULE: 'project.settings' SHORT_DATETIME_FORMAT: 'm/d/Y P' SHORT_DATE_FORMAT: 'm/d/Y' SIGNING_BACKEND: 'django.core.signing.TimestampSigner' SILENCED_SYSTEM_CHECKS: [] STATICFILES_DIRS: [] STATICFILES_FINDERS: ['django.contrib.staticfiles.finders.FileSystemFinder', 'django.contrib.staticfiles.finders.AppDirectoriesFinder'] STATICFILES_STORAGE: 'django.contrib.staticfiles.storage.StaticFilesStorage' STATIC_ROOT: '../staticroot' STATIC_URL: '/static/' TEMPLATES: [{'APP_DIRS': True, 'BACKEND': 'django_jinja.backend.Jinja2', 'DIRS': ['/main/srhaque/Innovatieltd/source/templates'], 'OPTIONS': {'app_dirname': 'templates', 'auto_reload': True, 'autoescape': True, 'bytecode_cache': {'backend': 'django_jinja.cache.BytecodeCache', 'enabled': False, 'name': 'default'}, 'constants': {}, 'context_processors': ['django.contrib.auth.context_processors.auth', 'django.contrib.messages.context_processors.messages'], 'extensions': ['jinja2.ext.do', 'jinja2.ext.loopcontrols', 'jinja2.ext.with_', 'jinja2.ext.i18n', 'jinja2.ext.autoescape', 'django_jinja.builtins.extensions.CsrfExtension', 'django_jinja.builtins.extensions.CacheExtension', 'django_jinja.builtins.extensions.TimezoneExtension', 'django_jinja.builtins.extensions.UrlsExtension', 'django_jinja.builtins.extensions.StaticFilesExtension', 'django_jinja.builtins.extensions.DjangoFiltersExtension', 'jinja_extensions.DebugExtension'], 'filters': {}, 'globals': {}, 'match_extension': ('.html', '.jinja'), 'match_regex': '^(?!admin/).*', 'newstyle_gettext': True, 'tests': {}, 'translation_engine': 'django.utils.translation', 'undefined': None}}, {'APP_DIRS': True, 'BACKEND': 'django.template.backends.django.DjangoTemplates', 'DIRS': ['/main/srhaque/Innovatieltd/source/templates'], 'OPTIONS': {'context_processors': ['django.template.context_processors.debug', 'django.template.context_processors.request', 'django.contrib.auth.context_processors.auth', 'django.contrib.messages.context_processors.messages']}}] TEST_NON_SERIALIZED_APPS: [] TEST_RUNNER: 'django.test.runner.DiscoverRunner' THOUSAND_SEPARATOR: ',' TIME_FORMAT: 'P' TIME_INPUT_FORMATS: ['%H:%M:%S', '%H:%M:%S.%f', '%H:%M'] TIME_ZONE: 'UTC' USE_I18N: True USE_L10N: True USE_THOUSAND_SEPARATOR: False USE_TZ: True USE_X_FORWARDED_HOST: False USE_X_FORWARDED_PORT: False WSGI_APPLICATION: 'project.wsgi.application' X_FRAME_OPTIONS: 'SAMEORIGIN' YEAR_MONTH_FORMAT: 'F Y' is_overridden:>
```
Steps to Reproduce
Required Dependencies
Python Packages
pip freeze
Output:```, the rest of pip freeze is: $ pip3 freeze aiohttp==3.5.4 alabaster==0.7.12 alembic==1.0.10 amqp==2.4.2 appdirs==1.4.3 apt-xapian-index==0.47 asn1crypto==0.24.0 async-timeout==3.0.1 atomicwrites==1.3.0 attrs==19.1.0 Automat==0.7.0 aws-shell==0.2.1 awscli==1.16.196 Babel==2.6.0 backcall==0.1.0 backports.csv==1.0.7 bcrypt==3.1.6 beautifulsoup4==4.7.1 billiard==3.6.0.0 blinker==1.4 boto3==1.9.137 botocore==1.12.186 cached-property==1.5.1 calendra==3.0 celery==4.3.0 certifi==2018.8.24 cffi==1.12.3 chardet==3.0.4 chromedriver==2.24.1 Click==7.0 colorama==0.3.9 command-not-found==0.3 configobj==5.0.6 constantly==15.1.0 coverage==4.5.3 cryptography==2.3 cssselect==1.0.3 cupshelpers==1.0 datadiff==2.0.0 decorator==4.4.0 deepdiff==4.0.6 defer==1.0.6 defusedxml==0.6.0 diff-match-patch==20181111 distro-info===0.21ubuntu2 Django==2.2 django-excel==0.0.10 django-extra-views==0.12.0 django-filter==2.1.0 django-import-export==1.2.0 django-jinja==2.4.1 django-jinja-bootstrap-form==4.2.3 django-model-utils==3.1.2 django-phonenumber-field==2.3.1 django-polymorphic==2.0.3 django-viewflow==1.5.3 django-webtest==1.9.4 dnspython==1.16.0 docutils==0.14 entrypoints==0.3 ephem==3.7.6.0 et-xmlfile==1.0.1 filelock==3.0.12 flake8==3.7.8 Flask==1.0.2 Flask-BabelEx==0.9.3 Flask-Gravatar==0.5.0 Flask-HTMLmin==1.5.0 Flask-Login==0.4.1 Flask-Mail==0.9.1 Flask-Migrate==2.4.0 Flask-Paranoid==0.2.0 Flask-Principal==0.4.0 Flask-Security==3.0.0 Flask-SQLAlchemy==2.3.2 Flask-WTF==0.14.2 gunicorn==19.9.0 html5lib==1.0.1 htmlmin==0.1.12 httplib2==0.11.3 hyperlink==19.0.0 idna==2.6 imagesize==1.1.0 importlib-metadata==0.18 incremental==17.5.0 ipython==7.6.1 ipython-genutils==0.2.0 isodate==0.6.0 itsdangerous==1.1.0 jdcal==1.4.1 jedi==0.13.3 Jinja2==2.10.1 jmespath==0.9.4 jpy==0.10.0.dev1 jsonpickle==1.1 keyring==17.1.1 keyrings.alt==3.1.1 kombu==4.5.0 language-selector==0.1 ldap3==2.4.1 lml==0.0.9 lunardate==0.2.0 lxml==4.3.3 Mako==1.0.9 MarkupSafe==1.1.1 mccabe==0.6.1 more-itertools==7.0.0 multicorn===-VERSION- multidict==4.5.2 netifaces==0.10.4 networkx==2.3 numpy==1.16.4 odfpy==1.4.0 olefile==0.46 openpyxl==2.5.14 ordered-set==3.1.1 packaging==19.0 paiyroll==0.1 paiyroll-fdw==0.1 pandas==0.24.2 paramiko==2.4.2 parso==0.4.0 passlib==1.7.1 pexpect==4.6.0 pgadmin4==4.5 phonenumbers==8.10.10 pickleshare==0.7.5 pika==1.0.1 Pillow==5.4.1 pluggy==0.12.0 prompt-toolkit==1.0.16 psutil==5.5.1 psycopg2==2.8.3 psycopg2-binary==2.8.2 py==1.8.0 pyasn1==0.4.2 pycairo==1.16.2 pyCalverter==1.6.1 pycodestyle==2.5.0 pycparser==2.19 pycrypto==2.6.1 pycups==1.9.73 pyecharts-jupyter-installer==0.0.3 pyexcel==0.5.13 pyexcel-handsontable==0.0.2 pyexcel-io==0.5.11 pyexcel-ods==0.5.6 pyexcel-webio==0.1.4 pyexcel-xls==0.5.8 pyexcel-xlsx==0.5.7 pyflakes==2.1.1 Pygments==2.3.1 PyGObject==3.32.0 PyHamcrest==1.9.0 PyJWT==1.7.1 pyluach==1.0.1 pymacaroons==0.13.0 PyNaCl==1.3.0 pyparsing==2.4.0 PyQt5==5.12.1 pyquery==1.4.0 pyrabbit==1.1.0 PySocks==1.6.8 pytest==4.4.1 pytest-cov==2.7.1 pytest-flake8==1.0.4 python-apt==1.8.4 python-consul==1.1.0.dev0 python-dateutil==2.8.0 python-debian==0.1.34 python-editor==1.0.4 pytz==2018.9 PyYAML==3.13 reportlab==3.5.18 requests==2.22.0 requests-toolbelt==0.9.1 requests-unixsocket==0.1.5 rsa==3.4.2 s3transfer==0.2.0 SecretStorage==2.3.1 selenium==3.141.0 simplejson==3.16.0 six==1.12.0 snowballstemmer==1.9.0 soupsieve==1.8 speaklater==1.3 Sphinx==2.1.2 sphinxcontrib-applehelp==1.0.1 sphinxcontrib-devhelp==1.0.1 sphinxcontrib-htmlhelp==1.0.2 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.2 sphinxcontrib-serializinghtml==1.1.3 SQLAlchemy==1.3.3 sqlparse==0.2.4 ssh-import-id==5.7 sshtunnel==0.1.4 systemd-python==234 tablib==0.13.0 tblib==1.3.2 terminaltables==3.1.0 texttable==1.6.1 toml==0.10.0 tornado==6.0.3 tox==3.13.2 traitlets==4.3.2 twilio==6.26.2 Twisted==19.2.1 ubuntu-advantage-tools==19.2 ubuntu-drivers-common==0.0.0 ufw==0.36 uk-postcode-utils==1.0 unattended-upgrades==0.1 urllib3==1.25.3 vine==1.3.0 virtualenv==16.6.1 waitress==1.3.0 wcwidth==0.1.7 webencodings==0.5.1 WebOb==1.8.5 WebTest==2.0.33 Werkzeug==0.15.2 WTForms==2.2.1 xdg==4.0.0 xkit==0.0.0 xlrd==1.2.0 xlwt==1.3.0 xmltodict==0.12.0 yarl==1.3.0 zeep==3.4.0 zipp==0.5.2 zope.interface==4.6.0 ```
Other Dependencies
N/A
Minimally Reproducible Test Case
```python # # Any exception thrown in the set() method of a result backend should show the issue. # raise KeyError("simulated failure") ```
Expected Behavior
A failure in any result backend (for example, as described in #5605), should be reported by the worker and then Celery ought to be able to continue to service further requests. The recovery mechanism is of course free to kill and replace the failing worker as needed.
Actual Behavior
In my case, I am using the Consul backend, though I believe this issue affects all backends. The Consul backend uses the python-consul package which can throw an exception while performing network operations. In Celery, this is reported like this:
This is fine. However, it seems that the affected worker then get "stuck" in that it is permanently visible in the output of "celery inspect active":
This happens to each worker as it hits the error, until Celery is unable to progress any work. On MacOS, though usually not on Linux, the workers are not then stopped using "celery control shutdown" but must be killed by hand.