david-cortes / isotree

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)
https://isotree.readthedocs.io
BSD 2-Clause "Simplified" License
197 stars 38 forks source link

Trapped signals #13

Closed ankane closed 3 years ago

ankane commented 4 years ago

Hi David, it looks like running fit_iforest adds a signal handler that causes signals to be ignored. You can test it out with a basic Flask app.

from flask import Flask
import numpy as np
from isotree import IsolationForest
app = Flask(__name__)

@app.route('/')
def hello_world():
    X = np.random.normal(size = (100, 2))
    iso = IsolationForest(ntrees = 10, ndim = 2, nthreads = 1)
    iso.fit(X)
    return 'Hello, World!'

Start the server and visit the page. When trying to exit with ctrl+C, it prints:

^CError: procedure was interrupted
^CError: procedure was interrupted
^CError: procedure was interrupted
^CError: procedure was interrupted
^CError: procedure was interrupted
david-cortes commented 4 years ago

That is indeed the case, and it's intentional, but the signals are not ignored: it does indeed interrupt the procedure in the C++ side when it receives an interrupt signal, but not on the Python side. Signals other than interrupt AFAIK shouldn't be ignored. Now, I'm not exactly sure what would happen when there is process forking along the way, as perhaps some processes would receive the signal and others won't - I'll have to investigate that.

If you don't like this behavior, you can remove the calls to signal(SIGINT, set_interrup_global_variable) in fit_model.cpp. Another alternative which I perhaps should add is to return the interrupt flag from C++ to Python/R and interrupt it also.

david-cortes commented 4 years ago

I did some digging and found out that there shouldn't be any issues in the interrupt handling when there are process forks as with flask, as each sub-process should receive its own signal and handle it independently.

Also did some experimentation changing the python reaction when it received the interrupt signal from the current

raise KeyboardInterrupt("Error: procedure was interrupted.")

to

os.kill(os.getpid(), signal.SIGTERM)

And it managed to terminate correctly when run from flask, BUT when run on a jupyter notebook, if the notebook sends an interrupt signal, it ends up killing the whole process that way, while the KeyboardInterrupt just throws an exception without killing the python process.

So I don't know how to make it work in such a way that it would just stop when interrupted from a python notebook and would still terminate the flask process when interrupted from flask. Suggestions welcome.

ankane commented 4 years ago

What's the reason for not having it propagate as normal?

The KeyboardInterrupt code is only executed if the interrupt happens during the execution of fit_iforest (and even in this case prevents the program from handling SIGINT as normal). If it happens any other time, nothing is raised.

david-cortes commented 4 years ago

I don't know how to propagate it as normal while also interrupting the C++ code without killing the process.

The idea with the C++ code is that it would stop if it gets an interrupt signal from e.g. a jupyter notebook, without having to wait for the whole C++ code to finish.

ankane commented 4 years ago

Hmm, I guess I don't really understand what the library is doing differently than something like LightGBM (algorithm aside), which uses Python, C++, and optionally OpenMP and doesn't use special signal handling from what I can tell (although it uses FFI).

david-cortes commented 4 years ago

As far as I know (and I might be very wrong), LightGBM doesn't respond to interrupt signals inside the C++ code. If you run LGBMClassifier().fit(X,y) on a jupyter notebook and click the interrupt button while it is on a C++ block, it will wait until that block is done without terminating it earlier, and then throw the interrupt error once it goes back to Python; whereas this library will terminate the C++ part before it is done.

ankane commented 4 years ago

Ah, makes sense (and seems to be confirmed from these answers). LightGBM is composed of many smaller calls (one for each boosting iteration), which is why it's not needed there. That could be one possible direction. Will continue to look for other options.

Edit: Another option might be to restore the original signal handler before fit_iforest finishes execution.

david-cortes commented 4 years ago

Also tried the following which didn't work:

Using PyErr_CheckSignals would make the flask app exit if then one presses Ctrl+C multiple times, but I don't think it's the best solution.

ankane commented 4 years ago

Thanks for trying out a few approaches. The commit above is definitely an improvement (but still traps the signal even when fit_iforest isn't running). For the Ruby library, I've removed the signal handling for now so it doesn't cause issues with web servers.

Based on the above, it seems like adding a method to fit each tree like XGBoost and LightGBM and avoiding signal handling is likely the cleanest method, but I totally understand if it's not something you'd like to add right now.

david-cortes commented 3 years ago

I'm finding out this causes more hassle than I expected so for version 0.1.24 it now has an additional argument handle_interrupt to control whether or not to do this at the C++ level.

ankane commented 3 years ago

Great, thanks @david-cortes. Will retire my fork for the Ruby gem.

david-cortes commented 3 years ago

I just realized that when calling the C function signal it returns the current signal handler. This was a really stupid bug.

ankane commented 3 years ago

Looks much cleaner. I think the handle_interrupt option would still be useful to avoid trapping signals on web servers, fwiw.

david-cortes commented 3 years ago

@ankane But the signals aren't trapped now - if you try it with flask, it will stop successfully with Ctrl + C, and before exit it restores the signal handler that was there when the function was called. Also it throws an exception when it gets interrupted.

ankane commented 3 years ago

My bad, missed that it still propagates. Just tested and works great now. Well done, and thanks!

Edit: It looks like there's a delay between Ctrl + C and exiting and no longer seeing Error: procedure was interrupted, so I'm not sure the signal is being trapped with 0.1.27.

david-cortes commented 3 years ago

@ankane Could you try again with the last commit (https://github.com/david-cortes/isotree/commit/09bace27985f5a563067d2ce46b8e89febf7833d)?

david-cortes commented 3 years ago

Finally found a way to make it safely propagate the signal to the original signal handler as if it never were trapped. In the C++ and R versions, it will however throw an exception if it receives a SIGINT, so please add #define DONT_THROW_ON_INTERRUPT is that problematic (e.g. the python process terminates when it encounters a C++ exception, whereas the R one only stops the current function).

ankane commented 3 years ago

Great, both 0.1.28 and master seem to stop immediately and propagate the signal. Thanks!