david-cortes / isotree

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)
https://isotree.readthedocs.io
BSD 2-Clause "Simplified" License
192 stars 38 forks source link

Saving models between interfaces #2

Closed AmanteguiCode closed 4 years ago

AmanteguiCode commented 4 years ago

Hi David, I'm now trying to use your library and so far good. But i would like to train my models in Python to latter on export them and use them for inference in cpp. Is there any way with your library interfaces to accomplish this task? Thanks, Borja.

david-cortes commented 4 years ago

For the time being, there isn't any such functionality, but I do plan to add it in the next version.

The way to do it is to modify the cython file cpp_interface.pyx. Basically, the model will internally have an object of the class isoforest_cpp_obj, and this class has members isoforest (if you use ndim=1) and ext_isoforest (if you use ndim>1), which are the objects you need to serialize.

The code C++ already has serialization defined using the cereal header, but it's not activated by default in the Python version, so you need to add in the setup.py file a macro _ENABLE_CEREAL (or pass argument -D_ENABLE_CEREAL to the compiler), and include the cereal headers in the setup include paths if it's not in your default directory.

With that done, you then need to write code to serialize those objects into raw text files using cereal, ~either from cython, or~ from c++ wrapped into cython (you can check the R wrapper in which it does this, but into an R raw vector rather than a file), and then de-serialize the raw bytes files in C++ in the code where they will be used.

Here's an example about serialization and de-serialization with cereal: https://uscilab.github.io/cereal/quickstart.html


Alternatively, Cython already does the same thing as cereal with its auto-pickle functionality (which is activated in the Python version), and will have the objects already serialized into raw bytes when you use pickle or similar, so you can also check the __reduce__ method that it provides. I'm not sure however how to extract a specific C++ object from the cython serialization process, and don't even know if it's possible without modifying the cython code either.


As another alternative, you could do it with the R version by writing the object model$cpp_obj$serialized into a raw text file - that's the model object serialized through cereal - and then deserializing that file in C++. ~You might also link to the same .so that R generates when installing it though -l:isotree.so -Wl,-rpath=/path/to/so/file/folder.~ You'll also have to add C++ functions to serialize and deserialize from C++ iostreams instead of R raw vectors.

AmanteguiCode commented 4 years ago

Thank you for fast response and edits! Let you know how it goes.

david-cortes commented 4 years ago

This is now implemented in the latest version. Please reopen if you encounter any issues with this functionality.