david-cortes / isotree

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)
https://isotree.readthedocs.io
BSD 2-Clause "Simplified" License
186 stars 38 forks source link

problem saving (exporting) model with imputer #24

Closed tmontana closed 3 years ago

tmontana commented 3 years ago

Hi. I upgraded to the latest version (Successfully installed isotree-0.1.31) so the imputer model can get saved alongside the main model. I am getting the following error:

iso.export_model(model_save_folder + preprocess_missing_model_file_name, use_cpp=True)


RuntimeError Traceback (most recent call last)

in 5 res = _files_fct.save_object_to_pkl(missing_imputer, model_save_folder + preprocess_missing_model_file_name) 6 else: ----> 7 iso.export_model(model_save_folder + preprocess_missing_model_file_name, use_cpp=True) 8 del iso 9 gc.collect() /anaconda/envs/isotree_2_missing/lib/python3.8/site-packages/isotree/__init__.py in export_model(self, file, use_cpp) 1458 with open(file + ".metadata", "w") as of: 1459 json.dump(metadata, of, indent=4) -> 1460 self._cpp_obj.serialize_obj(file, use_cpp, self.ndim > 1, has_imputer=self.build_imputer) 1461 return self 1462 isotree/cpp_interface.pyx in isotree._cpp_interface.isoforest_cpp_obj.serialize_obj() RuntimeError: Failed to write 3064 bytes to output stream! Wrote 864 Any idea? It's a large model - could this be the problem? Thank you
david-cortes commented 3 years ago

Are you running out of disk space?

BTW the latest version from pypi doesn't yet have the fix that allows saving the imputer. Did you install it from github?

tmontana commented 3 years ago

I'm on Azure with very large disks so don't think that's the issue. Yes I installed it from Github: pip install -U git+https://github.com/david-cortes/isotree.git

tmontana commented 3 years ago

The imputer file is 16 gig on disk when it fails

david-cortes commented 3 years ago

Does it still fail if you pass use_cpp=True?

tmontana commented 3 years ago

that's what I've been using

tmontana commented 3 years ago

If I pass use_cpp=False

I get another error: --------------------------------------------------------------------------- OSError Traceback (most recent call last)

in 5 res = _files_fct.save_object_to_pkl(missing_imputer, model_save_folder + preprocess_missing_model_file_name) 6 else: ----> 7 iso.export_model(model_save_folder + preprocess_missing_model_file_name, use_cpp=False) 8 del iso 9 gc.collect() /anaconda/envs/isotree_2_missing/lib/python3.8/site-packages/isotree/__init__.py in export_model(self, file, use_cpp) 1458 with open(file + ".metadata", "w") as of: 1459 json.dump(metadata, of, indent=4) -> 1460 self._cpp_obj.serialize_obj(file, use_cpp, self.ndim > 1, has_imputer=self.build_imputer) 1461 return self 1462 isotree/cpp_interface.pyx in isotree._cpp_interface.isoforest_cpp_obj.serialize_obj() isotree/cpp_interface.pyx in isotree._cpp_interface.isoforest_cpp_obj.serialize_obj() OSError: [Errno 28] No space left on device
david-cortes commented 3 years ago

Well that last line is the problem...

tmontana commented 3 years ago

yup. not sure how that's possible but it's definitely on my end. Thanks