benoitc / gunicorn

gunicorn 'Green Unicorn' is a WSGI HTTP Server for UNIX, fast clients and sleepy applications.
http://www.gunicorn.org
Other
9.84k stars 1.75k forks source link

Fork can eventually cause sigsegv on macos #2761

Open ScoreUnder opened 2 years ago

ScoreUnder commented 2 years ago

https://bugs.python.org/issue13829

If you fork without subsequently performing execve on macos, it has a chance (apparently not all the time, in my testing??) of causing the _scproxy functions to break, meaning that any HTTP requests made from the forked processes will crash.

I encountered this when debugging an issue on a macos machine where the gunicorn server repeatedly booted new workers. There was no log in between, and after figuring out that the worker never returned or threw, I tracked the code down to here, stuck a print inside, and noticed that the status was 11, i.e. signal.SIGSEGV. I enabled coredumps, and using those, lldb led me to notice that the crash was in MacOS networking libraries, which led to the python bug I linked at the top of this issue.

To solve this, I have this crappy workaround in the main process:

try:
    import _scproxy

    # Cache the return values of _scproxy functions
    # This avoids calling them in subprocesses
    proxies = _scproxy._get_proxies()
    _scproxy._get_proxies = lambda: proxies
    proxy_settings = _scproxy._get_proxy_settings()
    _scproxy._get_proxy_settings = lambda: proxy_settings
except ImportError:
    pass

However, a more thorough fix would involve using execve after fork, or using spawn, or deferring to the multiprocessing library which has already been burned by this issue and changed macos defaults.

If I sound a little vague on some details, I am mostly unfamiliar with gunicorn and am touching an existing project for the first time to fix this bug.

aminzg commented 1 year ago

Loved your solution!

angangwa commented 8 months ago

Hello, I see the same issue. I am using FastAPI with gunicorn with uvicorn workers. In my observation, if I use uvicorn directly, there are no SIGSEGV issues with a given load. But I see around 5% error rate when using gunicorn. I am testing on Apple M1 Pro with MacOS Sonoma.

I not sure about a way forward, so would appreciate any suggestions. For the workaround mentioned, here, where exactly does that code go, not sure what "main process" would be in my case.

I am assuming when I deploy to a linux based machine, I shouldn't see this issue but its breaking my local testing.

ScoreUnder commented 8 months ago

@angangwa

I not sure about a way forward, so would appreciate any suggestions. For the workaround mentioned, here, where exactly does that code go, not sure what "main process" would be in my case.

Executing it any time before gunicorn forks should work.

angangwa commented 8 months ago

@ScoreUnder that worked, thanks! I will now try and understand the details. Question - do you see any issue leaving this in? for e.g. when I move to a linux based machine?

ScoreUnder commented 8 months ago

Question - do you see any issue leaving this in? for e.g. when I move to a linux based machine?

It should be fine to leave it in. _scproxy does not exist on Linux so it will hit the ImportError except block and do nothing. The side effect of having this workaround in is that changes to system proxy settings on MacOS will require you to restart your app to apply them.

XiangyuL-Tursio commented 4 weeks ago

I am new to gunicorn. On my MacBook, when I tried to debug our code then got such error:

[2024-10-15 19:51:15 -0700] [8605] [ERROR] Worker (pid:8606) was sent SIGSEGV!

I tried waitress, it seems the everything is OK.

Such pieces of code had successfully run on AWS Elastic Beanstalk for years, which is known as running with gunicorn on Linux. So, I suspect it's caused by the incompatibility between gunicorn & macOS

Is there any workaround or mitigation for such an issue?

Thank you very much.