celery / celery

Distributed Task Queue (development branch)
https://docs.celeryq.dev
Other
24.48k stars 4.65k forks source link

Celery workers remain in defunct state & zombie processes #6656

Open alokgogate opened 3 years ago

alokgogate commented 3 years ago

Checklist

Mandatory Debugging Information

Optional Debugging Information

Related Issues and Possible Duplicates

Related Issues

Possible Duplicates

Environment & Settings

Celery version:

celery report Output:

``` celery -A task report software -> celery:4.4.4 (cliffs) kombu:4.6.10 py:3.6.4 billiard:3.6.3.0 redis:3.3.11 platform -> system:Linux arch:64bit kernel version:3.10.0-327.10.1.el7.x86_64 imp:CPython loader -> celery.loaders.app.AppLoader settings -> transport:sentinel results:sentinel://:**@x.x.x.x:xxxx/1;sentinel://:xxxxx@x.x.x.x:xxxx/1;sentinel://:xxxx@x.x.x.x:xxx/1 broker_url: 'sentinel://:********@x.x.x.x:xxxx/0' result_backend: 'sentinel://:********@x.x.x.x:xxx/1;sentinel://:xxxx@x.x.x.x:xxxx/1;sentinel://:xxxxx@x.x.x.x:xxxx/1' broker_transport_options: { 'master_name': 'xxxx', 'visibility_timeout': 180} result_backend_transport_options: { 'master_name': 'xxxx'} ```

Steps to Reproduce

Required Dependencies

Python Packages

pip freeze Output:

``` alabaster==0.7.10 amqp==2.6.0 anaconda-client==1.6.9 anaconda-navigator==1.7.0 anaconda-project==0.8.2 asn1crypto==0.24.0 astroid==1.6.1 astropy==2.0.3 attrs==17.4.0 Babel==2.5.3 backports.shutil-get-terminal-size==1.0.0 beautifulsoup4==4.6.0 billiard==3.6.3.0 bitarray==0.8.1 bkcharts==0.2 blaze==0.11.3 bleach==2.1.2 bokeh==0.12.13 boto==2.48.0 Bottleneck==1.2.1 celery==4.4.4 certifi==2018.1.18 cffi==1.11.4 chardet==3.0.4 click==6.7 cloudpickle==0.5.2 clyent==1.2.2 colorama==0.3.9 conda==4.4.10 conda-build==3.4.1 conda-verify==2.0.0 contextlib2==0.5.5 cryptography==2.1.4 cycler==0.10.0 Cython==0.27.3 cytoolz==0.9.0 dask==0.16.1 datashape==0.5.4 decorator==4.2.1 distributed==1.20.2 docutils==0.14 entrypoints==0.2.3 et-xmlfile==1.0.1 fastcache==1.0.2 filelock==2.0.13 Flask==1.1.1 Flask-Cors==3.0.3 flower==0.9.4 future==0.18.2 gevent==1.2.2 glob2==0.6 gmpy2==2.0.8 greenlet==0.4.12 gunicorn==20.0.4 h5py==2.7.1 heapdict==1.0.0 html5lib==1.0.1 humanize==0.5.1 idna==2.6 imageio==2.2.0 imagesize==0.7.1 importlib-metadata==1.6.0 ipykernel==4.8.0 ipython==6.2.1 ipython-genutils==0.2.0 ipywidgets==7.1.1 isort==4.2.15 itsdangerous==0.24 jdcal==1.3 jedi==0.11.1 Jinja2==2.11.1 jsonschema==2.6.0 jupyter==1.0.0 jupyter-client==5.2.2 jupyter-console==5.2.0 jupyter-core==4.4.0 jupyterlab==0.31.5 jupyterlab-launcher==0.10.2 kombu==4.6.10 lapjv==1.3.1 lazy-object-proxy==1.3.1 llvmlite==0.21.0 locket==0.2.0 lxml==4.1.1 MarkupSafe==1.0 matplotlib==2.1.2 mccabe==0.6.1 mistune==0.8.3 mpmath==1.0.0 msgpack-python==0.5.1 multipledispatch==0.4.9 navigator-updater==0.1.0 nbconvert==5.3.1 nbformat==4.4.0 networkx==2.1 nltk==3.2.5 nose==1.3.7 notebook==5.4.0 numba==0.36.2 numexpr==2.6.4 numpy==1.14.5 numpydoc==0.7.0 odo==0.5.1 olefile==0.45.1 openpyxl==2.4.10 packaging==16.8 pandas==1.0.4 pandocfilters==1.4.2 parso==0.1.1 partd==0.3.8 path.py==10.5 pathlib2==2.3.0 patsy==0.5.0 pep8==1.7.1 pexpect==4.3.1 pickleshare==0.7.4 Pillow==5.0.0 pkginfo==1.4.1 pluggy==0.6.0 ply==3.10 prompt-toolkit==1.0.15 psutil==5.4.3 ptyprocess==0.5.2 py==1.5.2 pycodestyle==2.3.1 pycosat==0.6.3 pycparser==2.18 pycrypto==2.6.1 pycurl==7.43.0.1 pyflakes==1.6.0 Pygments==2.2.0 pylint==1.8.2 pyodbc==4.0.22 pyOpenSSL==17.5.0 pyparsing==2.2.0 PySocks==1.6.7 pytest==3.3.2 python-dateutil==2.6.1 pytz==2017.3 PyWavelets==0.5.2 PyYAML==3.12 pyzmq==16.0.3 QtAwesome==0.4.4 qtconsole==4.3.1 QtPy==1.3.1 redis==3.3.11 requests==2.18.4 rope==0.10.7 ruamel-yaml==0.15.35 scikit-image==0.13.1 scikit-learn==0.19.1 scipy==1.0.0 seaborn==0.8.1 Send2Trash==1.4.2 simplegeneric==0.8.1 singledispatch==3.4.0.3 six==1.11.0 snowballstemmer==1.2.1 sortedcollections==0.5.3 sortedcontainers==1.5.9 Sphinx==1.6.6 sphinxcontrib-websupport==1.0.1 spyder==3.2.6 SQLAlchemy==1.2.1 statsmodels==0.8.0 sympy==1.1.1 tables==3.4.2 tblib==1.3.2 terminado==0.8.1 testpath==0.3.1 toolz==0.9.0 tornado==6.0.4 traitlets==4.3.2 typing==3.6.2 unicodecsv==0.14.1 urllib3==1.22 vine==1.3.0 wcwidth==0.1.7 webencodings==0.5.1 Werkzeug==1.0.0 widgetsnbextension==3.1.0 wrapt==1.10.11 xlrd==1.1.0 XlsxWriter==1.0.2 xlwt==1.3.0 zict==0.1.3 zipp==3.1.0 ```

Other Dependencies

N/A

Minimally Reproducible Test Case

``` celery worker --loglevel=DEBUG -A task -Ofair --time-limit 100 --concurrency=40 ```

Expected Behavior

The celery worker process should gracefully shutdown once the jobs are been processed

Actual Behavior

celery worker process remains as "[celery] " zombie processes

open-collective-bot[bot] commented 3 years ago

Hey @alokgogate :wave:, Thank you for opening an issue. We will get back to you as soon as we can. Also, check out our Open Collective and consider backing us - every little helps!

We also offer priority support for our sponsors. If you require immediate assistance please consider sponsoring us.

auvipy commented 3 years ago

hey did you try a latest release of celery instead of Minimal Celery Version: 4.4.4?

thedrow commented 3 years ago

This is currently not reproducible with the test case you provided. We need further information and a test case with code that reproduces your issue.

alokgogate commented 3 years ago

Hey @auvipy / @thedrow , This is an intermittent issue and arrises randomly. Like you've suggested @auvipy i'll go ahead and upgrade the version from 4.4.4 to latest available and see if this issue persists.

Yuruh commented 3 years ago

Did this fix your problem @alokgogate ? I'm running into the same issue with celery 4.4.7, and I can't upgrade to versions 5+

thedrow commented 3 years ago

This is an intermittent issue and arrises randomly.

Which is precisely why it is hard to fix.

Yuruh commented 3 years ago

I may have found a way to reproduce: for me it happens if the worker runs a process substitution.

e.g. the worker runs a bash script containing:

add_prefix() { sed "s/^/[PREFIX]/" >&2; }

(echo "test" >&2) 2> >(add_prefix)
Rajeshwar21 commented 1 year ago

Is this issue fixed ?

claudinoac commented 1 year ago

Still observing this in v5.3.1 We have a few workers (around 15 distributed across 3 different servers) and every few hours some of them would go unresponsive. We are using redis as broker and result backend

ooyamatakehisa commented 1 year ago

I had the same problem with a linux app image command (https://musescore.org/ja/download).

I came across this issue in an application deployed in production environment and this causes out of pid and I have to restart the app periodically. Please Please fix this issue 🙇‍♂️

I also did some experiments and the following bash script using process substitution actually created a zombie process as this comment mentioned.

#!/bin/bash

echo "abc" >(grep a)

implementation of process substitution


version: 5.3.1 broker and backend: rabbitmq