david-cortes / isotree

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)
https://isotree.readthedocs.io
BSD 2-Clause "Simplified" License
186 stars 38 forks source link

Importing regular Isolation Forest fails #18

Closed svenvanhal closed 3 years ago

svenvanhal commented 3 years ago

IsolationForest.import_model() has a critical bug which corrupts regular iForest models.

When importing a model, first a new IsolationForest object is created, whose constructor calls _reset_obj(). However, as this call is made before any loaded properties are set, it falls back to using default values.

When loading a regular iForest model, this causes property _is_extended_ to be set to True, because the default value for ndim is 3. Subsequent custom parameters overwrites from file in import_model() leave _is_extended_ untouched. Therefore, the resulting object now has both self.ndim == 1 and self._is_extended_ == True.

This renders the imported model useless.

To reproduce:

import numpy as np
from isotree import IsolationForest

def import_bug(model_export_path):

    # Create IsolationForest and fit with some dummy data
    iso = IsolationForest(ndim=1)
    iso.fit(np.random.rand(10, 10))

    # Export model to file
    iso.export_model(model_export_path)

    # Import model
    iso_loaded = IsolationForest.import_model(model_export_path)

    # Original object prints "Isolation Forest"
    print(iso)

    # Imported object prints "Extended Isolation Forest" 🤔
    print(iso_loaded)

    assert iso_loaded.ndim == iso.ndim  # ndim == 1
    assert iso_loaded._is_extended_ == iso._is_extended_  # Fails! _is_extended_ is True even though ndim == 1

if __name__ == "__main__":
    import_bug("/tmp/isotree_import_bug.pkl")