david-cortes / isotree

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)
https://isotree.readthedocs.io
BSD 2-Clause "Simplified" License
186 stars 38 forks source link

What is equivalent to contamination parameter in sklearn? #38

Closed po60nani closed 2 years ago

po60nani commented 2 years ago

Thank you for providing this interesting package. I'd like to use this package instead of scikit-learn, however, I'm not getting the same results with scikit-learn. I used the following configuration:

IsolationForest(max_samples=100, random_state=self.rng, bootstrap=False, warm_start=True, n_jobs=None, contamination=0.003, verbose=0) Contamination, I believe, is the most significant parameter for my aim. Could you please tell me how I may achieve the same results?

david-cortes commented 2 years ago

That's a scikit-learn-specific parameter that the devs there introduced in order to make the model predictions conform to their idioms. If you look at the docs, what that parameter does is set a threshold on outlier scores based on quantiles from the scores that you get on the training data. That should be very easy to mimic.