celery / celery

Distributed Task Queue (development branch)
https://docs.celeryq.dev
Other
24.52k stars 4.65k forks source link

Catastrophic failure by worker starvation on error in result backend #5642

Closed ShaheedHaque closed 3 years ago

ShaheedHaque commented 5 years ago

Checklist

Mandatory Debugging Information

Optional Debugging Information

Related Issues and Possible Duplicates

Related Issues

Possible Duplicates

Environment & Settings

Celery version:

celery report Output:

``` $ celery -A paiyroll report software -> celery:4.3.0 (rhubarb) kombu:4.5.0 py:3.7.3 billiard:3.6.0.0 py-amqp:2.4.2 platform -> system:Linux arch:64bit, ELF kernel version:5.0.0-20-generic imp:CPython loader -> celery.loaders.app.AppLoader settings -> transport:amqp results:consul://localhost:8500/ ABSOLUTE_URL_OVERRIDES: { } ADMINS: [] ALLOWED_HOSTS: ['*'] APPEND_SLASH: True AUTHENTICATION_BACKENDS: ['django.contrib.auth.backends.ModelBackend'] AUTH_PASSWORD_VALIDATORS: '********' AUTH_USER_MODEL: 'paiyroll.User' BASE_DIR: '/main/srhaque/Innovatieltd/source' CACHES: { 'default': {'BACKEND': 'django.core.cache.backends.locmem.LocMemCache'}} CACHE_MIDDLEWARE_ALIAS: 'default' CACHE_MIDDLEWARE_KEY_PREFIX: '********' CACHE_MIDDLEWARE_SECONDS: 600 CELERY_BROKER_HEARTBEAT: None CELERY_BROKER_URL: 'amqp://guest:********@localhost:5672//' CELERY_RESULT_BACKEND: 'consul://localhost:8500/' CSRF_COOKIE_AGE: 31449600 CSRF_COOKIE_DOMAIN: None CSRF_COOKIE_HTTPONLY: False CSRF_COOKIE_NAME: 'csrftoken' CSRF_COOKIE_PATH: '/' CSRF_COOKIE_SAMESITE: 'Lax' CSRF_COOKIE_SECURE: False CSRF_FAILURE_VIEW: 'django.views.csrf.csrf_failure' CSRF_HEADER_NAME: 'HTTP_X_CSRFTOKEN' CSRF_TRUSTED_ORIGINS: [] CSRF_USE_SESSIONS: False DATABASES: { 'default': { 'ATOMIC_REQUESTS': False, 'AUTOCOMMIT': True, 'CONN_MAX_AGE': 0, 'ENGINE': 'django.db.backends.postgresql', 'HOST': 'localhost', 'NAME': 'foo', 'OPTIONS': {}, 'PASSWORD': '********', 'PORT': '5432', 'TEST': { 'CHARSET': None, 'COLLATION': None, 'MIRROR': None, 'NAME': None}, 'TIME_ZONE': None, 'USER': 'dbcoreuser'}, 'fdw': { 'ATOMIC_REQUESTS': False, 'AUTOCOMMIT': True, 'CONN_MAX_AGE': 0, 'ENGINE': 'django.db.backends.postgresql', 'HOST': 'localhost', 'NAME': 'foo', 'OPTIONS': {}, 'PASSWORD': '********', 'PORT': '5432', 'TEST': { 'CHARSET': None, 'COLLATION': None, 'MIRROR': None, 'NAME': None}, 'TIME_ZONE': None, 'USER': 'dbcoreuser'}} DATABASE_ROUTERS: '********' DATA_UPLOAD_MAX_MEMORY_SIZE: 2621440 DATA_UPLOAD_MAX_NUMBER_FIELDS: 1000 DATETIME_FORMAT: 'N j, Y, P' DATETIME_INPUT_FORMATS: ['%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M:%S.%f', '%Y-%m-%d %H:%M', '%Y-%m-%d', '%m/%d/%Y %H:%M:%S', '%m/%d/%Y %H:%M:%S.%f', '%m/%d/%Y %H:%M', '%m/%d/%Y', '%m/%d/%y %H:%M:%S', '%m/%d/%y %H:%M:%S.%f', '%m/%d/%y %H:%M', '%m/%d/%y'] DATE_FORMAT: 'N j, Y' DATE_INPUT_FORMATS: ['%Y-%m-%d', '%m/%d/%Y', '%m/%d/%y', '%b %d %Y', '%b %d, %Y', '%d %b %Y', '%d %b, %Y', '%B %d %Y', '%B %d, %Y', '%d %B %Y', '%d %B, %Y'] DEBUG: True DEBUG_PROPAGATE_EXCEPTIONS: False DECIMAL_SEPARATOR: '.' DEFAULT_CHARSET: 'utf-8' DEFAULT_CONTENT_TYPE: 'text/html' DEFAULT_DB: 'default' DEFAULT_EXCEPTION_REPORTER_FILTER: 'django.views.debug.SafeExceptionReporterFilter' DEFAULT_FILE_STORAGE: 'django.core.files.storage.FileSystemStorage' DEFAULT_FROM_EMAIL: 'webmaster@localhost' DEFAULT_INDEX_TABLESPACE: '' DEFAULT_TABLESPACE: '' DISALLOWED_USER_AGENTS: [] EMAIL_BACKEND: 'django.core.mail.backends.filebased.EmailBackend' EMAIL_FILE_PATH: '/tmp/email_messages' EMAIL_HOST: 'smtp.gmail.com' EMAIL_HOST_PASSWORD: '********' EMAIL_HOST_USER: 'paiyroll.com@gmail.com' EMAIL_PORT: 587 EMAIL_SSL_CERTFILE: '' EMAIL_SSL_KEYFILE: '********' EMAIL_SUBJECT_PREFIX: '[Django] ' EMAIL_TIMEOUT: None EMAIL_USE_LOCALTIME: False EMAIL_USE_SSL: False EMAIL_USE_TLS: True FDW_DB: 'fdw' FILE_CHARSET: 'utf-8' FILE_UPLOAD_DIRECTORY_PERMISSIONS: None FILE_UPLOAD_HANDLERS: ['django.core.files.uploadhandler.MemoryFileUploadHandler', 'django.core.files.uploadhandler.TemporaryFileUploadHandler'] FILE_UPLOAD_MAX_MEMORY_SIZE: 2621440 FILE_UPLOAD_PERMISSIONS: None FILE_UPLOAD_TEMP_DIR: None FIRST_DAY_OF_WEEK: 0 FIXTURE_DIRS: [] FORCE_SCRIPT_NAME: None FORMAT_MODULE_PATH: None FORM_RENDERER: 'django.forms.renderers.DjangoTemplates' IGNORABLE_404_URLS: [] INSTALLED_APPS: ['paiyroll.apps.PaiyrollConfig', 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'django.contrib.postgres', 'django_jinja', 'bootstrapform_jinja', 'import_export', 'phonenumber_field', 'polymorphic', 'viewflow'] INTERNAL_IPS: [] LANGUAGES: [('af', 'Afrikaans'), ('ar', 'Arabic'), ('ast', 'Asturian'), ('az', 'Azerbaijani'), ('bg', 'Bulgarian'), ('be', 'Belarusian'), ('bn', 'Bengali'), ('br', 'Breton'), ('bs', 'Bosnian'), ('ca', 'Catalan'), ('cs', 'Czech'), ('cy', 'Welsh'), ('da', 'Danish'), ('de', 'German'), ('dsb', 'Lower Sorbian'), ('el', 'Greek'), ('en', 'English'), ('en-au', 'Australian English'), ('en-gb', 'British English'), ('eo', 'Esperanto'), ('es', 'Spanish'), ('es-ar', 'Argentinian Spanish'), ('es-co', 'Colombian Spanish'), ('es-mx', 'Mexican Spanish'), ('es-ni', 'Nicaraguan Spanish'), ('es-ve', 'Venezuelan Spanish'), ('et', 'Estonian'), ('eu', 'Basque'), ('fa', 'Persian'), ('fi', 'Finnish'), ('fr', 'French'), ('fy', 'Frisian'), ('ga', 'Irish'), ('gd', 'Scottish Gaelic'), ('gl', 'Galician'), ('he', 'Hebrew'), ('hi', 'Hindi'), ('hr', 'Croatian'), ('hsb', 'Upper Sorbian'), ('hu', 'Hungarian'), ('hy', 'Armenian'), ('ia', 'Interlingua'), ('id', 'Indonesian'), ('io', 'Ido'), ('is', 'Icelandic'), ('it', 'Italian'), ('ja', 'Japanese'), ('ka', 'Georgian'), ('kab', 'Kabyle'), ('kk', 'Kazakh'), ('km', 'Khmer'), ('kn', 'Kannada'), ('ko', 'Korean'), ('lb', 'Luxembourgish'), ('lt', 'Lithuanian'), ('lv', 'Latvian'), ('mk', 'Macedonian'), ('ml', 'Malayalam'), ('mn', 'Mongolian'), ('mr', 'Marathi'), ('my', 'Burmese'), ('nb', 'Norwegian Bokmål'), ('ne', 'Nepali'), ('nl', 'Dutch'), ('nn', 'Norwegian Nynorsk'), ('os', 'Ossetic'), ('pa', 'Punjabi'), ('pl', 'Polish'), ('pt', 'Portuguese'), ('pt-br', 'Brazilian Portuguese'), ('ro', 'Romanian'), ('ru', 'Russian'), ('sk', 'Slovak'), ('sl', 'Slovenian'), ('sq', 'Albanian'), ('sr', 'Serbian'), ('sr-latn', 'Serbian Latin'), ('sv', 'Swedish'), ('sw', 'Swahili'), ('ta', 'Tamil'), ('te', 'Telugu'), ('th', 'Thai'), ('tr', 'Turkish'), ('tt', 'Tatar'), ('udm', 'Udmurt'), ('uk', 'Ukrainian'), ('ur', 'Urdu'), ('vi', 'Vietnamese'), ('zh-hans', 'Simplified Chinese'), ('zh-hant', 'Traditional Chinese')] LANGUAGES_BIDI: ['he', 'ar', 'fa', 'ur'] LANGUAGE_CODE: 'en-us' LANGUAGE_COOKIE_AGE: None LANGUAGE_COOKIE_DOMAIN: None LANGUAGE_COOKIE_NAME: 'django_language' LANGUAGE_COOKIE_PATH: '/' LOCALE_PATHS: [] LOGGING: { 'disable_existing_loggers': False, 'formatters': { 'standard': { 'format': '%(asctime)s [%(levelname)s] ' '%(name)s: %(message)s'}}, 'handlers': { 'file': { 'backupCount': 10, 'class': 'logging.handlers.RotatingFileHandler', 'filename': '/main/srhaque/.local/share/paiyroll/django.log', 'formatter': 'standard', 'level': 'DEBUG', 'maxBytes': 15728640}}, 'loggers': {'': {'handlers': ['file'], 'level': 'INFO', 'propagate': True}}, 'version': 1} LOGGING_CONFIG: 'logging.config.dictConfig' LOGGING_DIR: '/main/srhaque/.local/share/paiyroll' LOGIN_REDIRECT_URL: '/accounts/profile/' LOGIN_URL: '/accounts/login/' LOGOUT_REDIRECT_URL: None MANAGERS: [] MEDIA_ROOT: '' MEDIA_URL: '' MESSAGE_STORAGE: 'django.contrib.messages.storage.fallback.FallbackStorage' MIDDLEWARE: ['django.middleware.security.SecurityMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'project.middleware.SessionExpiryMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware'] MIGRATION_MODULES: { } MONTH_DAY_FORMAT: 'F j' NUMBER_GROUPING: 0 PAIYROLL_CLIENT_OUTPUT_DIR: '/tmp' PAIYROLL_DEBUG_ADMIN_ACCESS_ALL_WORKFLOWS: True PAIYROLL_DEBUG_NOTIFICATIONS: True PAIYROLL_DEBUG_PAY_WALK: 0 PAIYROLL_DEBUG_PROMPTS: 259 PAIYROLL_DEBUG_SCHEDULING_DATE: '2018-01-01' PAIYROLL_DEBUG_SCHEDULING_FAST: 10 PAIYROLL_GB_RTI_ALL_TRANSACT: None PAIYROLL_GB_RTI_OTHERS_TRANSACT_FROM: '2019-04-06' PAIYROLL_GB_RTI_TEST_COMPANIES: ('HMRC', 'HMRC RTI Recognition') PAIYROLL_GB_RTI_TEST_COMPANIES_ACTUAL_T: '2019-03-31' PAIYROLL_INTERNAL_NETWORK: IPv4Network('192.168.1.0/24') PAIYROLL_NOTIFICATIONS: { 'email': { 'backend': 'django.core.mail.backends.filebased.EmailBackend', 'file_path': '/tmp/email_messages', 'host': 'smtp.gmail.com', 'password': '********', 'port': 587, 'ssl_certfile': '', 'ssl_keyfile': '********', 'timeout': None, 'use_ssl': False, 'use_tls': True, 'username': 'paiyroll.com@gmail.com'}, 'sms': { 'account_sid': 'ACbabf740b7a3d23010cbe381a8a184fe0', 'from': '+15005550006', 'password': '********'}} PAIYROLL_REPORT_SPEC_DIR: '/main/srhaque/Innovatieltd/source/paiyroll/report/JasperReports' PAIYROLL_REPORT_SPEC_SPREADSHEET: '/main/srhaque/Innovatieltd/source/paiyroll/report/Spreadsheets' PAIYROLL_SITE_NAME: 'login.paiyroll.com' PASSWORD_HASHERS: '********' PASSWORD_RESET_TIMEOUT_DAYS: '********' PREPEND_WWW: False ROOT_URLCONF: 'project.urls' SECRET_KEY: '********' SECURE_BROWSER_XSS_FILTER: False SECURE_CONTENT_TYPE_NOSNIFF: False SECURE_HSTS_INCLUDE_SUBDOMAINS: False SECURE_HSTS_PRELOAD: False SECURE_HSTS_SECONDS: 0 SECURE_PROXY_SSL_HEADER: None SECURE_REDIRECT_EXEMPT: [] SECURE_SSL_HOST: None SECURE_SSL_REDIRECT: False SERVER_EMAIL: 'root@localhost' SESSION_CACHE_ALIAS: 'default' SESSION_COOKIE_AGE: 5400 SESSION_COOKIE_DOMAIN: None SESSION_COOKIE_HTTPONLY: True SESSION_COOKIE_NAME: 'sessionid' SESSION_COOKIE_PATH: '/' SESSION_COOKIE_SAMESITE: 'Lax' SESSION_COOKIE_SECURE: False SESSION_ENGINE: 'django.contrib.sessions.backends.db' SESSION_EXPIRE_AT_BROWSER_CLOSE: False SESSION_FILE_PATH: None SESSION_SAVE_EVERY_REQUEST: False SESSION_SERIALIZER: 'django.contrib.sessions.serializers.JSONSerializer' SETTINGS_MODULE: 'project.settings' SHORT_DATETIME_FORMAT: 'm/d/Y P' SHORT_DATE_FORMAT: 'm/d/Y' SIGNING_BACKEND: 'django.core.signing.TimestampSigner' SILENCED_SYSTEM_CHECKS: [] STATICFILES_DIRS: [] STATICFILES_FINDERS: ['django.contrib.staticfiles.finders.FileSystemFinder', 'django.contrib.staticfiles.finders.AppDirectoriesFinder'] STATICFILES_STORAGE: 'django.contrib.staticfiles.storage.StaticFilesStorage' STATIC_ROOT: '../staticroot' STATIC_URL: '/static/' TEMPLATES: [{'APP_DIRS': True, 'BACKEND': 'django_jinja.backend.Jinja2', 'DIRS': ['/main/srhaque/Innovatieltd/source/templates'], 'OPTIONS': {'app_dirname': 'templates', 'auto_reload': True, 'autoescape': True, 'bytecode_cache': {'backend': 'django_jinja.cache.BytecodeCache', 'enabled': False, 'name': 'default'}, 'constants': {}, 'context_processors': ['django.contrib.auth.context_processors.auth', 'django.contrib.messages.context_processors.messages'], 'extensions': ['jinja2.ext.do', 'jinja2.ext.loopcontrols', 'jinja2.ext.with_', 'jinja2.ext.i18n', 'jinja2.ext.autoescape', 'django_jinja.builtins.extensions.CsrfExtension', 'django_jinja.builtins.extensions.CacheExtension', 'django_jinja.builtins.extensions.TimezoneExtension', 'django_jinja.builtins.extensions.UrlsExtension', 'django_jinja.builtins.extensions.StaticFilesExtension', 'django_jinja.builtins.extensions.DjangoFiltersExtension', 'jinja_extensions.DebugExtension'], 'filters': {}, 'globals': {}, 'match_extension': ('.html', '.jinja'), 'match_regex': '^(?!admin/).*', 'newstyle_gettext': True, 'tests': {}, 'translation_engine': 'django.utils.translation', 'undefined': None}}, {'APP_DIRS': True, 'BACKEND': 'django.template.backends.django.DjangoTemplates', 'DIRS': ['/main/srhaque/Innovatieltd/source/templates'], 'OPTIONS': {'context_processors': ['django.template.context_processors.debug', 'django.template.context_processors.request', 'django.contrib.auth.context_processors.auth', 'django.contrib.messages.context_processors.messages']}}] TEST_NON_SERIALIZED_APPS: [] TEST_RUNNER: 'django.test.runner.DiscoverRunner' THOUSAND_SEPARATOR: ',' TIME_FORMAT: 'P' TIME_INPUT_FORMATS: ['%H:%M:%S', '%H:%M:%S.%f', '%H:%M'] TIME_ZONE: 'UTC' USE_I18N: True USE_L10N: True USE_THOUSAND_SEPARATOR: False USE_TZ: True USE_X_FORWARDED_HOST: False USE_X_FORWARDED_PORT: False WSGI_APPLICATION: 'project.wsgi.application' X_FRAME_OPTIONS: 'SAMEORIGIN' YEAR_MONTH_FORMAT: 'F Y' is_overridden: > ```

Steps to Reproduce

Required Dependencies

Python Packages

pip freeze Output:

```, the rest of pip freeze is: $ pip3 freeze aiohttp==3.5.4 alabaster==0.7.12 alembic==1.0.10 amqp==2.4.2 appdirs==1.4.3 apt-xapian-index==0.47 asn1crypto==0.24.0 async-timeout==3.0.1 atomicwrites==1.3.0 attrs==19.1.0 Automat==0.7.0 aws-shell==0.2.1 awscli==1.16.196 Babel==2.6.0 backcall==0.1.0 backports.csv==1.0.7 bcrypt==3.1.6 beautifulsoup4==4.7.1 billiard==3.6.0.0 blinker==1.4 boto3==1.9.137 botocore==1.12.186 cached-property==1.5.1 calendra==3.0 celery==4.3.0 certifi==2018.8.24 cffi==1.12.3 chardet==3.0.4 chromedriver==2.24.1 Click==7.0 colorama==0.3.9 command-not-found==0.3 configobj==5.0.6 constantly==15.1.0 coverage==4.5.3 cryptography==2.3 cssselect==1.0.3 cupshelpers==1.0 datadiff==2.0.0 decorator==4.4.0 deepdiff==4.0.6 defer==1.0.6 defusedxml==0.6.0 diff-match-patch==20181111 distro-info===0.21ubuntu2 Django==2.2 django-excel==0.0.10 django-extra-views==0.12.0 django-filter==2.1.0 django-import-export==1.2.0 django-jinja==2.4.1 django-jinja-bootstrap-form==4.2.3 django-model-utils==3.1.2 django-phonenumber-field==2.3.1 django-polymorphic==2.0.3 django-viewflow==1.5.3 django-webtest==1.9.4 dnspython==1.16.0 docutils==0.14 entrypoints==0.3 ephem==3.7.6.0 et-xmlfile==1.0.1 filelock==3.0.12 flake8==3.7.8 Flask==1.0.2 Flask-BabelEx==0.9.3 Flask-Gravatar==0.5.0 Flask-HTMLmin==1.5.0 Flask-Login==0.4.1 Flask-Mail==0.9.1 Flask-Migrate==2.4.0 Flask-Paranoid==0.2.0 Flask-Principal==0.4.0 Flask-Security==3.0.0 Flask-SQLAlchemy==2.3.2 Flask-WTF==0.14.2 gunicorn==19.9.0 html5lib==1.0.1 htmlmin==0.1.12 httplib2==0.11.3 hyperlink==19.0.0 idna==2.6 imagesize==1.1.0 importlib-metadata==0.18 incremental==17.5.0 ipython==7.6.1 ipython-genutils==0.2.0 isodate==0.6.0 itsdangerous==1.1.0 jdcal==1.4.1 jedi==0.13.3 Jinja2==2.10.1 jmespath==0.9.4 jpy==0.10.0.dev1 jsonpickle==1.1 keyring==17.1.1 keyrings.alt==3.1.1 kombu==4.5.0 language-selector==0.1 ldap3==2.4.1 lml==0.0.9 lunardate==0.2.0 lxml==4.3.3 Mako==1.0.9 MarkupSafe==1.1.1 mccabe==0.6.1 more-itertools==7.0.0 multicorn===-VERSION- multidict==4.5.2 netifaces==0.10.4 networkx==2.3 numpy==1.16.4 odfpy==1.4.0 olefile==0.46 openpyxl==2.5.14 ordered-set==3.1.1 packaging==19.0 paiyroll==0.1 paiyroll-fdw==0.1 pandas==0.24.2 paramiko==2.4.2 parso==0.4.0 passlib==1.7.1 pexpect==4.6.0 pgadmin4==4.5 phonenumbers==8.10.10 pickleshare==0.7.5 pika==1.0.1 Pillow==5.4.1 pluggy==0.12.0 prompt-toolkit==1.0.16 psutil==5.5.1 psycopg2==2.8.3 psycopg2-binary==2.8.2 py==1.8.0 pyasn1==0.4.2 pycairo==1.16.2 pyCalverter==1.6.1 pycodestyle==2.5.0 pycparser==2.19 pycrypto==2.6.1 pycups==1.9.73 pyecharts-jupyter-installer==0.0.3 pyexcel==0.5.13 pyexcel-handsontable==0.0.2 pyexcel-io==0.5.11 pyexcel-ods==0.5.6 pyexcel-webio==0.1.4 pyexcel-xls==0.5.8 pyexcel-xlsx==0.5.7 pyflakes==2.1.1 Pygments==2.3.1 PyGObject==3.32.0 PyHamcrest==1.9.0 PyJWT==1.7.1 pyluach==1.0.1 pymacaroons==0.13.0 PyNaCl==1.3.0 pyparsing==2.4.0 PyQt5==5.12.1 pyquery==1.4.0 pyrabbit==1.1.0 PySocks==1.6.8 pytest==4.4.1 pytest-cov==2.7.1 pytest-flake8==1.0.4 python-apt==1.8.4 python-consul==1.1.0.dev0 python-dateutil==2.8.0 python-debian==0.1.34 python-editor==1.0.4 pytz==2018.9 PyYAML==3.13 reportlab==3.5.18 requests==2.22.0 requests-toolbelt==0.9.1 requests-unixsocket==0.1.5 rsa==3.4.2 s3transfer==0.2.0 SecretStorage==2.3.1 selenium==3.141.0 simplejson==3.16.0 six==1.12.0 snowballstemmer==1.9.0 soupsieve==1.8 speaklater==1.3 Sphinx==2.1.2 sphinxcontrib-applehelp==1.0.1 sphinxcontrib-devhelp==1.0.1 sphinxcontrib-htmlhelp==1.0.2 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.2 sphinxcontrib-serializinghtml==1.1.3 SQLAlchemy==1.3.3 sqlparse==0.2.4 ssh-import-id==5.7 sshtunnel==0.1.4 systemd-python==234 tablib==0.13.0 tblib==1.3.2 terminaltables==3.1.0 texttable==1.6.1 toml==0.10.0 tornado==6.0.3 tox==3.13.2 traitlets==4.3.2 twilio==6.26.2 Twisted==19.2.1 ubuntu-advantage-tools==19.2 ubuntu-drivers-common==0.0.0 ufw==0.36 uk-postcode-utils==1.0 unattended-upgrades==0.1 urllib3==1.25.3 vine==1.3.0 virtualenv==16.6.1 waitress==1.3.0 wcwidth==0.1.7 webencodings==0.5.1 WebOb==1.8.5 WebTest==2.0.33 Werkzeug==0.15.2 WTForms==2.2.1 xdg==4.0.0 xkit==0.0.0 xlrd==1.2.0 xlwt==1.3.0 xmltodict==0.12.0 yarl==1.3.0 zeep==3.4.0 zipp==0.5.2 zope.interface==4.6.0 ```

Other Dependencies

N/A

Minimally Reproducible Test Case

```python # # Any exception thrown in the set() method of a result backend should show the issue. # raise KeyError("simulated failure") ```

Expected Behavior

A failure in any result backend (for example, as described in #5605), should be reported by the worker and then Celery ought to be able to continue to service further requests. The recovery mechanism is of course free to kill and replace the failing worker as needed.

Actual Behavior

In my case, I am using the Consul backend, though I believe this issue affects all backends. The Consul backend uses the python-consul package which can throw an exception while performing network operations. In Celery, this is reported like this:

[2019-07-12 08:48:51,191: WARNING/ForkPoolWorker-80] /usr/local/lib/python3.7/dist-packages/celery/app/trace.py:568: RuntimeWarning: Exception raised outside body: TypeError("'bool' object is not subscriptable"):
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 449, in trace_task
    uuid, retval, task_request, publish_result,
  File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 149, in mark_as_done
    self.store_result(task_id, result, state, request=request)
  File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 342, in store_result
    request=request, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 714, in _store_result
    self.set(self.get_key_for_task(task_id), self.encode(meta))
  File "/usr/local/lib/python3.7/dist-packages/celery/backends/consul.py", line 92, in set
    ttl=self.expires)
  File "/usr/local/lib/python3.7/dist-packages/consul/base.py", line 1781, in create
    data=data)
  File "/usr/local/lib/python3.7/dist-packages/consul/std.py", line 33, in put
    self.session.request('PUT', uri, body=data, headers=JSON_HEADER)))
  File "/usr/local/lib/python3.7/dist-packages/consul/base.py", line 234, in cb
    data = data['ID']
TypeError: 'bool' object is not subscriptable

  exc, exc_info.traceback)))

This is fine. However, it seems that the affected worker then get "stuck" in that it is permanently visible in the output of "celery inspect active":

$ celery inspect active
-> celery@freenas: OK
    * {'id': '2996120d-87d2-4c9b-8774-b7a3b64e1332', 'name': 'paiyroll.tasks.function_run', 'args': "['paiyroll.report_run', 'report_preflight_validate_async', 1921]", 'kwargs': '{}', 'type': 'paiyroll.tasks.function_run', 'hostname': 'celery@freenas', 'time_start': 1562921376.9528346, 'acknowledged': True, 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': False}, 'worker_pid': 7111}
.
.
.

This happens to each worker as it hits the error, until Celery is unable to progress any work. On MacOS, though usually not on Linux, the workers are not then stopped using "celery control shutdown" but must be killed by hand.

ShaheedHaque commented 5 years ago

A patch such as this is sufficient to contain the problem:

$ diff /usr/local/lib/python3.7/dist-packages/celery/backends/consul.py hacked_consul.py 
90,93c90,94
<         session_id = self.client.session.create(name=session_name,
<                                                 behavior='delete',
<                                                 ttl=self.expires)
<         logger.debug('Created Consul session %s', session_id)
---
>         try:
>             session_id = self.client.session.create(name=session_name,
>                                                     behavior='delete',
>                                                     ttl=self.expires)
>             logger.debug('Created Consul session %s', session_id)
95,98c96,101
<         logger.debug('Writing key %s to Consul', key)
<         return self.client.kv.put(key=key,
<                                   value=value,
<                                   acquire=session_id)
---
>             logger.debug('Writing key %s to Consul', key)
>             return self.client.kv.put(key=key,
>                                       value=value,
>                                       acquire=session_id)
>         except TypeError as e:
>             logger.exception('cannot save result for {}={}: {}'.format(session_name, value, e))

Of course, this is a terrible Consul-specific hack and not a general fix for all backends.

auvipy commented 5 years ago

thanks for the detailed report

asfaltboy commented 4 years ago

I had the same issue (on stable v4.3.0) when using the django-db and redis backends, and was able to resolve it by upgrading to latest master (or 4.4.0rc4).


Here is a simple the reproducible example (on older version): https://gist.github.com/asfaltboy/81dfde85551b5a9029f8d1b962e5422d

The key settings that cause this issue are "task_acks_late=true" and "worker_prefetch_multiplier=1". With a different prefetch, the worker consumes that number of messages (e.g 4 for the default value of 4), and after all of these fail to be stored, reaches the same "starved" state.

Reproduction steps are as follows:

  1. Start worker: celery -A my_app -c 1 -l debug worker
  2. Queue a failing task w/ 5s sleep: celery -A my_app call my_app.fail --args='[5]'
  3. Wait for worker to receive the task and start the sleep
  4. (before 5s reached) Stop the result backend (e.g stop redis: brew services stop redis)
  5. Wait for MainProcess to reach failure. The fail goes through 3 retry connect loops, ending with:

    [2019-11-28 12:17:24,453: ERROR/MainProcess] Pool callback raised exception: ConnectionError('Error 61 connecting to localhost:6379. Connection refused.')

  6. Start the result backend (e.g: brew services start redis)
  7. Queue another task: celery -A my_app call my_app.fail --args='[0.1]'
  8. Result: task isn't being picked up by the worker.

Note: this may be a bug in billiard, I don't know enough about the internals, but I'll try to bisect and update here as I travel back in time:

@ShaheedHaque can you please try installing version 4.4.0rc4 ?

@auvipy - I think we can remove the 4.5 milestone, if Shaheed confirms it's fixed in 4.4

ShaheedHaque commented 4 years ago

A repro would be tricky here since my pursuit of the various failure modes I had thought might be involved (in Consul itself, or in the requests/urllib library used to talk to it) all ended inconclusively. I do have a PR queued up to make python-consul threadsafe by replacing requests/urllib with urllib3, but that has not been merged either. See https://github.com/cablehead/python-consul/pull/258.

ShaheedHaque commented 4 years ago

I can only confirm that on 4.3.0, I am no longer seeing the underlying failure (i.e outside Celery) that was the original trigger of this issue for me, so have no way to tell if 4.4.0rc4 fixes it.

ShaheedHaque commented 4 years ago

I suspect this has much in common with #4363. I'm NOT marking this as a duplicate however because Consul is a SyncBackend whereas #4363 relates to Redis which I believe is an AsyncBackend. I'll leave it to the devs to consider if this is, in fact, a dupe.

asfaltboy commented 4 years ago

I would only note that any exception that occurs while saving a task result, in 4.3 and prior, always causes "worker starvation" state (where the worker will not consume further tasks), at least with task_acks_late enabled.

But, I've shown this to be true with multiple backends, and that was fixed in 4.4. I am aware that various backends may still have various issues, given the right circumstances but at least these won't "starve the worker forever". We could probably create an integration test case in order to ensure we don't regress to this issue again (though I don't know which change in 4.4 fixed this, or might even be in billiard!)

auvipy commented 4 years ago

celery==4.4.0rc5 is on pypi

ShaheedHaque commented 4 years ago

@asfaltboy thanks for the helpful summary

thedrow commented 4 years ago

How would you plan such a test case for our integration suite?

ShaheedHaque commented 4 years ago

Just a note that the recent updates to #5605 mean that the root cause of the issue that triggers this for me is known.

auvipy commented 3 years ago

Just a note that the recent updates to #5605 mean that the root cause of the issue that triggers this for me is known.

we moved to python-consul2 in master can you check that fixed it for you? https://github.com/celery/celery/commit/ae463025c12d78c2b96a885aa4385ff33811c17a

ShaheedHaque commented 3 years ago

The move to python-consul2 does not actually fix anything; I believe the rationale for moving was simply that it seemed to be alive, whereas python-consul upstream seems inactive.

That said, I run with the ugly/hacky workaround I documented in #5605, so I don't see that problem or this problem any longer.

auvipy commented 3 years ago

backends were made thread-safe in a recent PR. that didn't help too? should we document the workaround you provide?

ShaheedHaque commented 3 years ago

I cannot find the PR you refer to right now, but I do remember looking at it, and IIRC, it does not address the problem of #5605. Actually, from my point of view, rather than document the workaround, I wonder if - in the absence of a better solution - my workaround should be committed on the grounds that correctness trumps any performance concerns?

ShaheedHaque commented 3 years ago

I retested against the current release and have not been able to see this issue. Closing. #5605 is still open of course.