david-cortes / isotree

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)
https://isotree.readthedocs.io
BSD 2-Clause "Simplified" License
186 stars 38 forks source link

Training Isotree with Python on Windows then Deserializing with C++ on Raspberry Pi 3B+ (Linux) #49

Closed hmcd7 closed 1 year ago

hmcd7 commented 1 year ago

I am trying to serialize the model from example one of isotree_example.ipynb on a Windows system, then deserialize it using C++ on the Raspberry Pi 3B+ with a Linux OS but I am encountering some problems.

The model was serialized using the export_model method with add_metadata_file set to false. image

The main file being executed is isotree_demo.cpp (shown below) which simply tries to deserialize the model.

image

Before deserializing with deserialize_combined, the model is checked with inspect_serialized_object which gives the following values: image

Running isotree_demo led to an "unexpected error" in serialize.cpp in the deserialize_model function: image

In an attempt to debug, I printed the values being checked by the deserialize_model function: image

I am unsure why saved_int_t and saved_size_t are being set to the PlatformSize enum value 4 (Other) when the Raspberry Pi 3B+ is 32bit.

I attempted to force the 32bit check to true as shown in the screenshot below, but this caused a segmentation fault. image

Any suggestions would be appreciated.

david-cortes commented 1 year ago

Thanks for the detailed bug report. There's indeed something going wrong when you saved the model on windows, as it should have values 2 and 3 for int_t and size_t, respectively.

A couple questions:

david-cortes commented 1 year ago

Actually I just realized there's a bug in the de-serialization. I've pushed a fix under branch alt_int_size - could you try using that branch in the C++ raspberry platform and confirm if it fized the issue?

git clone http://github.com/david-cortes/isotree.git
cd isotree
git checkout alt_int_size
mkdir build
cd build
cmake ..
make
hmcd7 commented 1 year ago

Thank you for your quick reply. It seems the change you made corrected the size_t and int_t values. However, there is still a segmentation fault. image

david-cortes commented 1 year ago

Thanks for the info. I see in your code that you are passing unallocated pointers to deserialize_combined - these should be passed as already-constructed objects - i.e.:

IsoForest model;
ExtIsoForest model_ext;
Imputer imputer;
TreesIndexer indexer;
FILE *fin = fopen("serialized_file.bin", "r");
deserialize_combined(
    fin,
    &model,
    &model_ext,
    &imputer,
    &indexer,
    nullptr
);

If that still crashes, any chance that you could attach the .bin file here?

hmcd7 commented 1 year ago

Thank you, I've made those corrections but it unfortunately still crashes. Here is the bin file. Thank you. isotree_model.zip

david-cortes commented 1 year ago

Thank you, I've made those corrections but it unfortunately still crashes. Here is the bin file. Thank you. isotree_model.zip

Thanks. So it seems there's a bug with the model adding metadata when it shouldn't. Nevertheless, it's still possible to load this file if you allocate a buffer for the metadata from the inspect_serialized_object outputs, or if you use the C interface or the OOP interface from the other headers.

The following works for me if you pull the latest commit from the master branch - could you give it a try?

#include "isotree.hpp"
#include <stdio.h>
#include <memory>

int main()
{
    ExtIsoForest model_ext;
    FILE *fin = fopen("isotree_model.bin", "r");

    bool is_isotree_model;
    bool is_compatible;
    bool has_combined_objects;
    bool has_IsoForest;
    bool has_ExtIsoForest;
    bool has_Imputer;
    bool has_Indexer;
    bool has_metadata;
    size_t size_metadata;
    inspect_serialized_object(
        fin,
        is_isotree_model,
        is_compatible,
        has_combined_objects,
        has_IsoForest,
        has_ExtIsoForest,
        has_Imputer,
        has_Indexer,
        has_metadata,
        size_metadata
    );
    std::unique_ptr<char[]> buffer(new char[size_metadata]);

    deserialize_combined(
        fin,
        nullptr,
        &model_ext,
        nullptr,
        nullptr,
        buffer.get()
    );
    double Xnew[] = {1., 1.};
    double odepth;
    predict_iforest(
        Xnew, nullptr,
        false, 2, 0,
        nullptr, nullptr, nullptr,
        nullptr, nullptr, nullptr,
        1, 1, true,
        nullptr, &model_ext,
        &odepth, nullptr,
        nullptr,
        nullptr
    );
    std::cout << "score:" << odepth << std::endl;
    return 0;
}
hmcd7 commented 1 year ago

Thank you very much, that worked for me as well.