NeuroDataDesign / manifold_random_forests

1 stars 0 forks source link

[WIP] Staring a new PR with building on top of sklearn. #6

Closed adam2392 closed 3 years ago

adam2392 commented 3 years ago

PR Description

Start of PR that aims to:

Summary

Summary of Python level API:

Summary of Cython level:

To test the sklearn's RF code: we will copy the sklearn/ensemble files which include native RF and no need to modify. Run on oblique_forests.RandomForestClassifier and sklearn.RandomForestClassifier and literally they should be identical cuz the code at Python level and Cython won't change.

^ This may not be the case if say sklearn ppl want us to incorporate the Oblique Cython changes into the existing _tree and _splitter files, but imo that doesn't make semantic sense and would complicate the existing codebase imo, rather then just using inheritence to enable a new API, which is what we're doing now.

Merge checklist

Maintainer, please confirm the following before merging:

adam2392 commented 3 years ago

One remaining challenge is implementation of pickling via the __getstate__ and __setstate__ functions, which needs some handling of the proj_vecs pointer pointer array.

This will enable cysporf to be passed be pickled when doing something like cross_val_score.

Note that joblib works when "building" forests. Not sure what makes one work but not the other.

adam2392 commented 3 years ago

Added in the Criterion files that sklearn v0.24.2 uses.

adam2392 commented 3 years ago

Added the ensemble set of files from v0.24.2 to verify that our modifications of the Cython level files under tree/ do not hurt the RandomForest Classifier/Regressor.

adam2392 commented 3 years ago

Good call on the refactoring. Do we need to keep the gradient boosting-related files under ensemble? They seem unrelated

Nah we can prolly get rid of those. I just copied everything in :p. But it seems that at least ensemble/_forest.py does not rely on those files, so we're okay.