david-cortes / isotree

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)
https://isotree.readthedocs.io
BSD 2-Clause "Simplified" License
186 stars 38 forks source link

model.imputer file not exported when build_imputer == True #22

Closed zds0 closed 3 years ago

zds0 commented 3 years ago

When instantiating and fitting a model as IsolationForest(build_imputer=True).fit(), when export_model() is called the required model.imputer file is not saved.

Thus, when subsequently calling IsolationForest.import_model() the model is not loaded with the required imputing functionality.

The fix appears to be simple: the build_imputer param needs to be passed to _cpp_obj.serialize_obj(has_imputer=self.build_imputer)

To reproduce:

import numpy as np
from isotree import IsolationForest

def main():

    X = np.random.randn(10, 10)
    X_missing = np.random.randn(10, 10)
    X_missing[0, 0] = np.nan

    iso = IsolationForest(build_imputer=True)
    iso.fit(X)
    iso.export_model('example_model', use_cpp=True)

    assert not np.isnan(iso.transform(X_missing)).any() # properly imputes here

    iso_imported = IsolationForest.import_model('example_model', use_cpp=True)

    assert not np.isnan(iso_imported.transform(X_missing)).any() # fails bc example_model.imputer file unavailable to import

if __name__ == "__main__":
    main()