Closed ankane closed 3 years ago
That is indeed the case, and it's intentional, but the signals are not ignored: it does indeed interrupt the procedure in the C++ side when it receives an interrupt signal, but not on the Python side. Signals other than interrupt AFAIK shouldn't be ignored. Now, I'm not exactly sure what would happen when there is process forking along the way, as perhaps some processes would receive the signal and others won't - I'll have to investigate that.
If you don't like this behavior, you can remove the calls to signal(SIGINT, set_interrup_global_variable)
in fit_model.cpp
. Another alternative which I perhaps should add is to return the interrupt flag from C++ to Python/R and interrupt it also.
I did some digging and found out that there shouldn't be any issues in the interrupt handling when there are process forks as with flask, as each sub-process should receive its own signal and handle it independently.
Also did some experimentation changing the python reaction when it received the interrupt signal from the current
raise KeyboardInterrupt("Error: procedure was interrupted.")
to
os.kill(os.getpid(), signal.SIGTERM)
And it managed to terminate correctly when run from flask, BUT when run on a jupyter notebook, if the notebook sends an interrupt signal, it ends up killing the whole process that way, while the KeyboardInterrupt
just throws an exception without killing the python process.
So I don't know how to make it work in such a way that it would just stop when interrupted from a python notebook and would still terminate the flask process when interrupted from flask. Suggestions welcome.
What's the reason for not having it propagate as normal?
The KeyboardInterrupt
code is only executed if the interrupt happens during the execution of fit_iforest
(and even in this case prevents the program from handling SIGINT as normal). If it happens any other time, nothing is raised.
I don't know how to propagate it as normal while also interrupting the C++ code without killing the process.
The idea with the C++ code is that it would stop if it gets an interrupt signal from e.g. a jupyter notebook, without having to wait for the whole C++ code to finish.
Hmm, I guess I don't really understand what the library is doing differently than something like LightGBM (algorithm aside), which uses Python, C++, and optionally OpenMP and doesn't use special signal handling from what I can tell (although it uses FFI).
As far as I know (and I might be very wrong), LightGBM doesn't respond to interrupt signals inside the C++ code. If you run LGBMClassifier().fit(X,y)
on a jupyter notebook and click the interrupt button while it is on a C++ block, it will wait until that block is done without terminating it earlier, and then throw the interrupt error once it goes back to Python; whereas this library will terminate the C++ part before it is done.
Ah, makes sense (and seems to be confirmed from these answers). LightGBM is composed of many smaller calls (one for each boosting iteration), which is why it's not needed there. That could be one possible direction. Will continue to look for other options.
Edit: Another option might be to restore the original signal handler before fit_iforest
finishes execution.
Also tried the following which didn't work:
SA_NODEFER
(the SIGINT
still doesn't reach python).PyErr_CheckSignals
within the C++ code, outside of the openmp loop (when there is a signal handler, sometimes it catches the signals, sometimes not, don't know what makes it work or fail).SIGINT
through kill(getpid(), SIGINT)
(in python: os.kill(os.getpid(), signal.SIGINT)
) and then calling PyErr_CheckSignals
.Using PyErr_CheckSignals
would make the flask app exit if then one presses Ctrl+C multiple times, but I don't think it's the best solution.
Thanks for trying out a few approaches. The commit above is definitely an improvement (but still traps the signal even when fit_iforest
isn't running). For the Ruby library, I've removed the signal handling for now so it doesn't cause issues with web servers.
Based on the above, it seems like adding a method to fit each tree like XGBoost and LightGBM and avoiding signal handling is likely the cleanest method, but I totally understand if it's not something you'd like to add right now.
I'm finding out this causes more hassle than I expected so for version 0.1.24 it now has an additional argument handle_interrupt
to control whether or not to do this at the C++ level.
Great, thanks @david-cortes. Will retire my fork for the Ruby gem.
I just realized that when calling the C function signal
it returns the current signal handler. This was a really stupid bug.
Looks much cleaner. I think the handle_interrupt
option would still be useful to avoid trapping signals on web servers, fwiw.
@ankane But the signals aren't trapped now - if you try it with flask, it will stop successfully with Ctrl + C, and before exit it restores the signal handler that was there when the function was called. Also it throws an exception when it gets interrupted.
My bad, missed that it still propagates. Just tested and works great now. Well done, and thanks!
Edit: It looks like there's a delay between Ctrl + C and exiting and no longer seeing Error: procedure was interrupted
, so I'm not sure the signal is being trapped with 0.1.27.
@ankane Could you try again with the last commit (https://github.com/david-cortes/isotree/commit/09bace27985f5a563067d2ce46b8e89febf7833d)?
Finally found a way to make it safely propagate the signal to the original signal handler as if it never were trapped. In the C++ and R versions, it will however throw an exception if it receives a SIGINT, so please add #define DONT_THROW_ON_INTERRUPT
is that problematic (e.g. the python process terminates when it encounters a C++ exception, whereas the R one only stops the current function).
Great, both 0.1.28 and master seem to stop immediately and propagate the signal. Thanks!
Hi David, it looks like running
fit_iforest
adds a signal handler that causes signals to be ignored. You can test it out with a basic Flask app.Start the server and visit the page. When trying to exit with ctrl+C, it prints: