crflynn / skgrf

scikit-learn compatible Python bindings for grf (generalized random forests) C++ random forest library
https://skgrf.readthedocs.io/en/stable/
GNU General Public License v3.0
31 stars 7 forks source link

modify how serialize forest accesses trees #28

Closed crflynn closed 3 years ago

crflynn commented 3 years ago

In attempting to build wheels, linux runners manifest the following error:

https://github.com/crflynn/skgrf-wheels/pull/1/checks?check_run_id=2318777493

skgrf/ensemble/grf.cpp: In function ‘PyObject* __pyx_f_5skgrf_8ensemble_3grf_serialize_forest(grf::Forest*)’:
skgrf/ensemble/grf.cpp:9392:109: error: call of overloaded ‘move(__gnu_cxx::__alloc_traits<std::allocator<std::unique_ptr<grf::Tree> > >::value_type&)’ is ambiguous
     __pyx_v_tree = cython_std::move<std::unique_ptr<grf::Tree>  &>((__pyx_v_forest->get_trees_()[__pyx_v_t]));

This removes the move but it is then a bit challenging to otherwise simplify this code. Hopefully this eliminates the compilation errors.

erikcs commented 3 years ago

Hi @crflynn, you can probably drop the serialization step for your python bindings all-together if you want, the only reason it's done in the R package is for R's native save to work (doesn't have a hook like python's pickle).

An alternative where the wrapper just holds a pointer to the forest is here:https://github.com/grf-labs/grf/pull/598, assume you can do something similar in Cython, unless you want to implement pickle support you don't have to serialize.

crflynn commented 3 years ago

Support for pickling is important as it's the standard method for persisting models and getting them into production applications. Your suggestion however might make single predictions faster as the python-cpp boundary would be simpler. I'll see if there is an equivalent in Cython. Thanks.

erikcs commented 3 years ago

You might also want to increase the default number of trees in causal forest, 100 will typically not give great kernel weights. If you are using 100 as the default because of sklearn, I can imagine it's that low because of speed (which isn't much of an issue since grf is faster)

crflynn commented 3 years ago

Will address that as well. Feel free to open issues on anything else you notice and I'll do my best to address them.

erikcs commented 3 years ago

Also feel free to @ me if you have any questions.