Open anatolec opened 2 years ago
I'm not familiar with these packages, would be great if someone with such expertise can help take a look.
@AnatolePledg do you get the hanging forever if you test with web: gunicorn --preload -w 1 run:app
? Or if using an inplace_predict
instead?
I started experiencing a similar hanging with XGBClassifier after upgrading to 1.6.1 (but I was running inside a worker process in Heroku), but my predict calls were inside a multiprocessing.Pool
which I think is surfacing known issues with threading/workers and predict:
https://github.com/dmlc/xgboost/issues/4246 https://github.com/dmlc/xgboost/issues/7044
and a couple of others. Curious why only 1.6 started surfacing issues now and not before, but I wonder if gunicorn's model for creating workers is creating a similar conflict for you.
Hi @josiahkhor,
-w 1
and inplace_predict
do not fix the problem. It still hangs forever. It's really the activation of the--preload
option that causes the issue.
hey, we're experiencing the same issue.
gunicorn uses os.fork()
to spawn a new worker - https://github.com/benoitc/gunicorn/blob/master/gunicorn/arbiter.py#L567 - which I suppose is unix fork in case you're running linux in docker.
I was able to find this issue - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58378 - which somehow explains the problem. Also, it says it was fixed here - https://bugs.python.org/issue8713 - by adding "forkserver", unfortunately only in multiprocessing, which gunicorn is not using :( - as it is calling the "raw" API.
dunno what should be the fix, I don't see a way to ask for forkserver with os module in python. in addition, the problem might lie somewhere else as well. as it is still unknown why it started to be an issue in >1.6.0
lemme know what you think
I think I found a workaround for this specific problem with Gunicorn and xgboost>2.0.0
. OpenMP 5 introduced a function that frees up all resources, which we can use before a call to os.fork()
: https://www.openmp.org/spec-html/5.0/openmpsu153.html
In particular, the following pre_fork()
hook seems to work. Put it into a hooks.py
file and pass the file as an argument to gunicorn
with "-c", "hook.py"
:
import ctypes
from xgboost.libpath import find_lib_path
def pre_fork(server, worker) -> None: # type: ignore
lib_xgboost = find_lib_path()
print(f"Found libxgboost: {lib_xgboost}")
if not lib_xgboost:
print("Cannot release OpenMP resources before fork.")
else:
libc = ctypes.CDLL(lib_xgboost[0])
OMP_PAUSE_SOFT = 1
libc.omp_pause_resource_all.restype = ctypes.c_int
libc.omp_pause_resource_all.argtypes = [ctypes.c_int]
kind = ctypes.c_int(OMP_PAUSE_SOFT)
result = libc.omp_pause_resource_all(kind)
print(f"Called omp_pause_resource_all with kind={kind.value}, result={result}")
In fact, instead of doing it in a pre_fork()
hook, it works fine for me if I do the above directly after loading the xgboost model. Loading the model is the last operation related to xgboost in the main process before the worker processes are created by os.fork()
.
Maybe a function like prepare_for_fork()
could be an addition to the sklearn
interface, depending on OpenMP version 5 availability (OpenMP 5 release).
More generically (for other libraries that use OpenMP and have the same problem), find_lib_path()
can be replaced with something like this function (for Linux):
def find_lib_gomp() -> list[str]:
pid = psutil.Process()
# Get the list of loaded shared libraries (memory mappings)
return [lib.path for lib in pid.memory_maps() if "libgomp" in lib.path]
Thank you for sharing. Yes, fork() can be a problem for both openmp and cuda as described in the related issues linked in the comment by @josiahkhor . The best way to workaround it is simply using a pre-fork library like loky, or the forkserver
shared by @maroshmka .
The workaround by @Raemi is interesting, but I suspect that it's not quite robust for the long term.
The issue is a duplication of https://github.com/dmlc/xgboost/issues/7044#issuecomment-1039912899 .
We have an XGBoost model served via a Flask app on Heroku. This Flask app is launched using gunicorn. We are using the
--preload
option of gunicorn in order to mutualize memory consumption of the several workers that are launched (4).This setup was working well until the upgrade to version 1.6.0 when it stopped working. Now the predict function of our XGBClassifier hangs forever.
Environment:
Gunicorn command:
web: gunicorn --preload -w 4 run:app
Error: