Open liangpinglk opened 2 years ago
Hey @liangpinglk :wave:, Thank you for opening an issue. We will get back to you as soon as we can. Also, check out our Open Collective and consider backing us - every little helps!
We also offer priority support for our sponsors. If you require immediate assistance please consider sponsoring us.
I found that this problem had been discussed for a long time,but it still exists now,how can i fix it?(I need multiprocess) https://github.com/celery/celery/issues/2958 https://groups.google.com/g/celery-users/c/E1kYCQySzuE?pli=1 https://stackoverflow.com/questions/56767461/celery-workerlosterror-worker-exited-prematurely-signal-6-sigabrt
The solution I found was to handle that exception internally and go on with your life. Like try and except and ignore that exception
Got it today as: Worker exited prematurely: signal 15 (SIGTERM)
with:
software -> celery:5.2.7 (dawn-chorus) kombu:5.2.4 py:3.9.2
billiard:3.6.4.0 redis:4.5.4
platform -> system:Linux arch:64bit, ELF
kernel version:5.10.0-9-amd64 imp:CPython
Stack trace:
WorkerLostError: Worker exited prematurely: signal 15 (SIGTERM) Job: 11.
File "hkis/consumers.py", line 166, in answer
is_valid, message = await check_answer(
File "hkis/tasks.py", line 174, in check_answer
return await asyncio.get_running_loop().run_in_executor(
File "concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "hkis/tasks.py", line 172, in sync_celery_check_answer
return check_answer_task.apply_async((answer,), expires=60).get()
File "celery/result.py", line 224, in get
return self.backend.wait_for_pending(
File "celery/backends/asynchronous.py", line 223, in wait_for_pending
return result.maybe_throw(callback=callback, propagate=propagate)
File "celery/result.py", line 336, in maybe_throw
self.throw(value, self._to_remote_traceback(tb))
File "celery/result.py", line 329, in throw
self.on_ready.throw(*args, **kwargs)
File "vine/promises.py", line 234, in throw
reraise(type(exc), exc, tb)
File "vine/utils.py", line 30, in reraise
raise value
Happen that a process has been killed by the OOM killer.
Here's what I'm seeing from a systemd point of view after reproducing the issue:
hkis-celery.service: A process of this unit has been killed by the OOM killer.
worker: Warm shutdown (MainProcess)
[2023-04-20 09:09:17,070: ERROR/MainProcess] Process 'ForkPoolWorker-1' pid:1057626 exited with 'signal 15 (SIGTERM)'
[2023-04-20 09:09:17,288: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 15 (SIGTERM) Job: 23.')
Traceback (most recent call last):
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/celery/bootsteps.py", line 365, in start
return self.obj.start()
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 332, in start
blueprint.start(self)
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 628, in start
c.loop(*c.loop_args())
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/celery/worker/loops.py", line 97, in asynloop
next(loop)
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/kombu/asynchronous/hub.py", line 295, in create_loop
tick_callback()
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/kombu/transport/redis.py", line 1311, in on_poll_start
cycle_poll_start()
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/kombu/transport/redis.py", line 532, in on_poll_start
self._register_BRPOP(channel)
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/kombu/transport/redis.py", line 518, in _register_BRPOP
channel._brpop_start()
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/kombu/transport/redis.py", line 950, in _brpop_start
self.client.connection.send_command(*command_args)
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/redis/connection.py", line 841, in send_command
self._command_packer.pack(*args),
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/redis/connection.py", line 554, in pack
buff = SYM_EMPTY.join((SYM_STAR, str(len(args)).encode(), SYM_CRLF))
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/celery/apps/worker.py", line 299, in _handle_request
raise exc(exitcode)
celery.exceptions.WorkerShutdown: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/hkis-celery/venv/lib/python3.9/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 15 (SIGTERM) Job: 23.
hkis-celery.service: Failed with result 'oom-kill'.
Hi,
I've run into this error when trying to roll forward from python 3.10.9 (which works fine). I get it with both 3.10.10 and 3.10.11:
celery: 5.2.7 billiard: 3.6.4.0 Django: 3.2.16 (LTS) MacOS: 12.6.3 python: 3.10.9 works, 3.10.10 & 3.10.11 have the problem
Some celery tasks complete successfully (eg: sending email). However, it appears to be the first interaction with a Django model that triggers the problem. Here's the line in my code that generates the exception. task_model
can be one of 2 different models that are tracking the processing task. task_id
is valid and the .get() should return a single instance:
begin_processing_task = task_model.objects.get(id=task_id)
here's the exception:
objc[70333]: +[NSCFConstantString initialize] may have been in progress in another thread when fork() was called. objc[70333]: +[NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug. [2023-05-08 20:00:04,742: ERROR/MainProcess] Process 'ForkPoolWorker-1' pid:70333 exited with 'signal 6 (SIGABRT)' [2023-05-08 20:00:04,755: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 6 (SIGABRT) Job: 0.') Traceback (most recent call last): File "/Users/richard/VirtualEnvs/ontheday_heroku_3.10.10/lib/python3.10/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost raise WorkerLostError( billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 6 (SIGABRT) Job: 0. [2023-05-08 20:43:00,037:
Some more detail on this problem.
I'm running a project on Heroku which forces the use of psycopg2-binary instead of psycopg2. There was a recent update to v2.9.6 that was confusing me when I initially ran into the problem. Recent debugging has revealed the following:
Works: MacOS on Apple Silicon
Does not work: MacOS on Apple Silicon
Works: MacOS on Apple Silicon
Also works: MacOS on Intel silicon
So, for some reason, celery with psycopg2-binary 2.9.6 on Apple silicon is causing problems.
Any ideas??
Hi,
I've run into this error when trying to roll forward from python 3.10.9 (which works fine). I get it with both 3.10.10 and 3.10.11:
celery: 5.2.7 billiard: 3.6.4.0 Django: 3.2.16 (LTS) MacOS: 12.6.3 python: 3.10.9 works, 3.10.10 & 3.10.11 have the problem
Some celery tasks complete successfully (eg: sending email). However, it appears to be the first interaction with a Django model that triggers the problem. Here's the line in my code that generates the exception.
task_model
can be one of 2 different models that are tracking the processing task.task_id
is valid and the .get() should return a single instance:
begin_processing_task = task_model.objects.get(id=task_id)
here's the exception:
objc[70333]: +[NSCFConstantString initialize] may have been in progress in another thread when fork() was called. objc[70333]: +[NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug. [2023-05-08 20:00:04,742: ERROR/MainProcess] Process 'ForkPoolWorker-1' pid:70333 exited with 'signal 6 (SIGABRT)' [2023-05-08 20:00:04,755: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 6 (SIGABRT) Job: 0.') Traceback (most recent call last): File "/Users/richard/VirtualEnvs/ontheday_heroku_3.10.10/lib/python3.10/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost raise WorkerLostError( billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 6 (SIGABRT) Job: 0. [2023-05-08 20:43:00,037:
@richardbrockie I am encountering the same problem with MacOS on Apple Silicon, Celery 5.2.7 with python 3.9.16, psycopg2 2.9.6 and psycopg2 2.9.5 (which I tried after reading this post). I didn't bother to try using a binary version of psycopg2 for it is not advised to be used for development.
This problem only occurred after I updated SQLAlchemy (to 2.0 from 1.4) and other related libraries.
This problem didn't occur if I run the Celery worker inside a docker container in the same Mac.
@candleindark I also reported the problem in the psycopg2 repo: https://github.com/psycopg/psycopg2/issues/1593, where they pointed out that this is likely to be due to how macOS forks processes. What OS are you running in your docker container? I'm pretty sure it won't be macOS?
After playing around on Saturday, I now have a satisfactory work-around. I've verified that the -binary
of v2.9.6 works fine in my production Heroku deployment, so am now specifying different requirements based on the OS. I have updated to python 3.10.12 as part of my fiddling:
# different versions of psycopg2 for different platforms...
psycopg2==2.9.6; sys_platform == "darwin"
psycopg2-binary==2.9.6; sys_platform == "linux"
During this past weekend, I did manage to have both psycopg2 and psycopg2-binary v2.9.6 installed side-by-side in a venv which had me thinking at times that the non-binary was also having the same problem.
Later this year I'll be rolling forward from Django 3.2.x LTS to 4.2 and be able to move to psycopg3 which I expect to be better tested in all the possible development environments.
Heroku's example app (which looks up-to-date) seems to support my suspicion where they have this: https://github.com/heroku/python-getting-started/blob/main/requirements.txt.
# Uncomment these lines to use a Postgres database. Both are needed, since in production
# (which uses Linux) we want to install from source, so that security updates from the
# underlying Heroku stack image are picked up automatically, thanks to dynamic linking.
# On other platforms/in development, the precompiled binary package is used instead, to
# speed up installation and avoid errors from missing libraries/headers.
#psycopg; sys_platform == "linux"
#psycopg[binary]; sys_platform != "linux"
@richardbrockie I am encountering the same problem with MacOS on Apple Silicon, Celery 5.2.7 with python 3.9.16, psycopg2 2.9.6 and psycopg2 2.9.5 (which I tried after reading this post). I didn't bother to try using a binary version of psycopg2 for it is not advised to be used for development.
This problem only occurred after I updated SQLAlchemy (to 2.0 from 1.4) and other related libraries.
This problem didn't occur if I run the Celery worker inside a docker container in the same Mac.
@richardbrockie My docker container is running Debian 11.
This problem is only effecting me in development when I don't run the Celery worker in a container in my Mac. I guess I will just start running the Celery worker in a container even in development from now on.
I think I will consider switching to Psycopg 3 as well. I just use Psycopg2 for my use of SQLAlchemy. I don't use it directly. Do you know if there is anything I need to pay attention to make the switch from Psycopg2 to Psycopg3?
Thanks
Encountering the same issue.
Platform: macOS Ventura 13.4.1 Intel Python 3.9.17 and 3.10.12 celery[redis]==5.2.7 and celery[redis]==5.3.1
objc[7037]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[7037]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
2023-07-04 16:11:12,075 | ERROR | MainProcess | Process 'ForkPoolWorker-2' pid:7037 exited with 'signal 6 (SIGABRT)'
2023-07-04 16:11:12,088 | ERROR | MainProcess | Task handler raised error: WorkerLostError('Worker exited prematurely: signal 6 (SIGABRT) Job: 0.')
Traceback (most recent call last):
File "/Users/user/project/.venv/lib/python3.9/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 6 (SIGABRT) Job: 0.
I found a workaround on stackoverflow:
Setting OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
before executing celery solves the issue with multiprocessing on macOS.
I've been facing the same issue. I've upgraded to python 3.10 and celery 2.3.3 now. My python kept crashing and I thought it was a celery issue.
But @richardbrockie's observations helped. I was checking and downgrading all my packages except psycopg2-binary - I had 2.9.7 installed. On reverting back to 2.9.5 - the crash has stopped on my M1 Max (apple silicon mac - running OS Sonoma Public Beta).
So this works for me - python 3.10, celery 2.3.3 and psycopg2-binary 2.9.5.
thanks all for feedback and cross checking.
I don't know that this issue should be closed. The psycopg2 folks seem to think this is an issue around how celery handles forking on macOS (as you can see in https://github.com/psycopg/psycopg2/issues/1593#issuecomment-1604096215), and the fact that downgrading psycopg2 fixes it doesn't necessarily mean they're wrong.
@auvipy Can you reopen the issue? I can reproduce it with python 3.8.18, celery 5.3.4 and psycopg2-binary 2.9.8.
it happens to me on Ubuntu 22.04, psycopg2-binary and newest stable Django + newest stable celery. Trying to find the reason, it seem to be running while handling massive amount of tasks (I've reproduced it by handling generating thumbnails for few thousands of images), haven't happened to me yet with any 'single' tasks. Maybe it's a clue? Can't find any other reason as for now
Seem that changing the type to simple instead of forking and disabling multi solves it (source: https://sam.hooke.me/note/2023/01/celery-and-systemd/)
Seem that changing the type to simple instead of forking and disabling multi solves it (source: https://sam.hooke.me/note/2023/01/celery-and-systemd/)
that is a great one!
Some more detail on this problem.
I'm running a project on Heroku which forces the use of psycopg2-binary instead of psycopg2. There was a recent update to v2.9.6 that was confusing me when I initially ran into the problem. Recent debugging has revealed the following:
Works: MacOS on Apple Silicon
- Celery 5.2.7 with python 3.10.9 with psycopg2-binary 2.9.5
Does not work: MacOS on Apple Silicon
- Celery 5.2.7 with python 3.10.9 with psycopg2-binary 2.9.6
Works: MacOS on Apple Silicon
- Celery 5.2.7 with python 3.10.9 with psycopg2 2.9.6 (no -binary)
Also works: MacOS on Intel silicon
- Celery 5.2.7 with python 3.10.9 with psycopg2-binary 2.9.6 (with -binary)
So, for some reason, celery with psycopg2-binary 2.9.6 on Apple silicon is causing problems.
Any ideas??
Hi, I've run into this error when trying to roll forward from python 3.10.9 (which works fine). I get it with both 3.10.10 and 3.10.11: celery: 5.2.7 billiard: 3.6.4.0 Django: 3.2.16 (LTS) MacOS: 12.6.3 python: 3.10.9 works, 3.10.10 & 3.10.11 have the problem Some celery tasks complete successfully (eg: sending email). However, it appears to be the first interaction with a Django model that triggers the problem. Here's the line in my code that generates the exception.
task_model
can be one of 2 different models that are tracking the processing task.task_id
is valid and the .get() should return a single instance:begin_processing_task = task_model.objects.get(id=task_id)
here's the exception:objc[70333]: +[NSCFConstantString initialize] may have been in progress in another thread when fork() was called. objc[70333]: +[NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug. [2023-05-08 20:00:04,742: ERROR/MainProcess] Process 'ForkPoolWorker-1' pid:70333 exited with 'signal 6 (SIGABRT)' [2023-05-08 20:00:04,755: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 6 (SIGABRT) Job: 0.') Traceback (most recent call last): File "/Users/richard/VirtualEnvs/ontheday_heroku_3.10.10/lib/python3.10/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost raise WorkerLostError( billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 6 (SIGABRT) Job: 0. [2023-05-08 20:43:00,037:
Thanks a lot @richardbrockie
Hi, there I am very new to Python. I am using FastAPI with Celery Worker. Everything seems fine until I try to use PyGAD which it needs NumPy. The problem is whenever I 'import NumPy' in any python files, the error always occurs like this. "billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 6" If anyone could help, it would be great. Thanks in advance.
Later this year I'll be rolling forward from Django 3.2.x LTS to 4.2 and be able to move to psycopg3 which I expect to be better tested in all the possible development environments.
I'm seeing the same error with Django 4.2 and the latest version of psycopg3 (pyscopg[binary]
version 3.1.13). Downgrading to psycopg2-binary 2.9.5 fixes the problem.
I'm having the same issue with celery + ultralytics
.
But I found a workaround to keep developing by running celery with --concurrency 1
and --pool solo
. Here the config I used on vscode launch.json:
"args": [
"-A",
"celery_task_app.worker",
"worker",
"-c",
"1",
"--pool",
"solo",
"--loglevel=info"
],
Platform: Apple Silicon
Thanks for the pointer to the --pool
flag: this reminded me that I still have eventlet
in my requirements list from when I was developing on Windows. I've confirmed that both --pool solo
and --pool eventlet
avoid the problem on Apple Silicon.
I'm having the same issue with
celery + ultralytics
.But I found a workaround to keep developing by running celery with
--concurrency 1
and--pool solo
. Here the config I used on vscode launch.json:"args": [ "-A", "celery_task_app.worker", "worker", "-c", "1", "--pool", "solo", "--loglevel=info" ],
Platform: Apple Silicon
Later this year I'll be rolling forward from Django 3.2.x LTS to 4.2 and be able to move to psycopg3 which I expect to be better tested in all the possible development environments.
I'm seeing the same error with Django 4.2 and the latest version of psycopg3 (
pyscopg[binary]
version 3.1.13). Downgrading to psycopg2-binary 2.9.5 fixes the problem.
That's disappointing! :(
Can you comment whether setting the --pool solo
option when running celery solves the problem?
@heyman As expected, I also see the problem with psycopg[binary]
. Setting the --pool
flag resolves the problem as it does for later versions of psycopg2
.
Setting the
--pool
flag resolves the problem as it does for later versions ofpsycopg2
.
I guess that works if you're fine with only running a single worker. For many projects I'm not though.
Also experiencing this issue during Django/Celery development ( MacOS 13.6.4, python 3.11.6, psycopg[binary]==3.1.18, django==4.2.11, redis==5.0.3, celery==5.3.6)
...can confirm adding --concurrency 1 --pool solo
to my celery start script when developing locally worked.
Some more detail on this problem. I'm running a project on Heroku which forces the use of psycopg2-binary instead of psycopg2. There was a recent update to v2.9.6 that was confusing me when I initially ran into the problem. Recent debugging has revealed the following: Works: MacOS on Apple Silicon
- Celery 5.2.7 with python 3.10.9 with psycopg2-binary 2.9.5
Does not work: MacOS on Apple Silicon
- Celery 5.2.7 with python 3.10.9 with psycopg2-binary 2.9.6
Works: MacOS on Apple Silicon
- Celery 5.2.7 with python 3.10.9 with psycopg2 2.9.6 (no -binary)
Also works: MacOS on Intel silicon
- Celery 5.2.7 with python 3.10.9 with psycopg2-binary 2.9.6 (with -binary)
So, for some reason, celery with psycopg2-binary 2.9.6 on Apple silicon is causing problems. Any ideas??
Hi, I've run into this error when trying to roll forward from python 3.10.9 (which works fine). I get it with both 3.10.10 and 3.10.11: celery: 5.2.7 billiard: 3.6.4.0 Django: 3.2.16 (LTS) MacOS: 12.6.3 python: 3.10.9 works, 3.10.10 & 3.10.11 have the problem Some celery tasks complete successfully (eg: sending email). However, it appears to be the first interaction with a Django model that triggers the problem. Here's the line in my code that generates the exception.
task_model
can be one of 2 different models that are tracking the processing task.task_id
is valid and the .get() should return a single instance:begin_processing_task = task_model.objects.get(id=task_id)
here's the exception:objc[70333]: +[NSCFConstantString initialize] may have been in progress in another thread when fork() was called. objc[70333]: +[NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug. [2023-05-08 20:00:04,742: ERROR/MainProcess] Process 'ForkPoolWorker-1' pid:70333 exited with 'signal 6 (SIGABRT)' [2023-05-08 20:00:04,755: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 6 (SIGABRT) Job: 0.') Traceback (most recent call last): File "/Users/richard/VirtualEnvs/ontheday_heroku_3.10.10/lib/python3.10/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost raise WorkerLostError( billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 6 (SIGABRT) Job: 0. [2023-05-08 20:43:00,037:
Thanks a lot @richardbrockie
I have macOS Sonoma 14.4.1 intel sillicon 8 core
python 3.10.11 celery: ^5.4.0 / 5.2.7 psycopg[binary]: 2.9.5/2.9.6
ran with -c 4 Tried all combinations and I still have the same issue. Although some jobs do not raise this error most of them do.
I'm having the same issue with
celery + ultralytics
.But I found a workaround to keep developing by running celery with
--concurrency 1
and--pool solo
. Here the config I used on vscode launch.json:"args": [ "-A", "celery_task_app.worker", "worker", "-c", "1", "--pool", "solo", "--loglevel=info" ],
Platform: Apple Silicon
Thank you! -c 1
and --pool solo
resolved the error.
Hi, guys , its been 2 years, did any find a workaround without setting the pool size to 1 ?
Ran into this issue recently as well.
My temporary fix was to use --pool=threads
celery -A proj worker --pool=threads
However, it would be preferable to have similar multiprocessing in a local dev environment...
Has anyone found any better way to address it ?
celery info
error