NeuroDataDesign / manifold_random_forests

1 stars 0 forks source link

Sklearnization of splitter #4

Closed parthgvora closed 3 years ago

parthgvora commented 3 years ago

Thanks for contributing. If this is your first time, make sure to read contributing.md

PR Description

This is to PR in a sklearn style splitter. The main goal is to refactor the code in _split.pyx to be:

I will try to achieve the following:

  1. Put the entire splitter into cython; sample_proj_mat will be written in cython and the whole thing will just be a cython object. 2. No parts of it should be in Python a. I might have to keep the GIL just for random number generation, but this should be easily replaceable when merging with sklearn since we can use their utils
  2. Remove my impurity functions & replace with sklearn's criterion function
  3. Make my splitter return a SplitRecord struct like sklearn's splitter
  4. Make my splitter inherit from sklearn's splitter if possible.
  5. Document what else needs to be done to the builder & tree to make it fit sklearn's code structure

Merge checklist

Maintainer, please confirm the following before merging:

adam2392 commented 3 years ago

Would it helpful for us to just literally copy/paste the _utils.pxd/pyx files? I imagine they would be useful for the morf parts too.

parthgvora commented 3 years ago

Would it helpful for us to just literally copy/paste the _utils.pxd/pyx files? I imagine they would be useful for the morf parts too.

It could be useful, but utils.pyx has dependency on sklearn's utils:

from ..utils._random cimport our_rand_r

Im not sure if this would cause a mess of dependencies if we just try to use their function or not. What do you think?

adam2392 commented 3 years ago

Im not sure if this would cause a mess of dependencies if we just try to use their function or not. What do you think?

I just looked at sklearn and it looks like the utils that you need is pretty short. I ported it over, so that way we can use their tested util funcs + get to nogil faster.

We can probably continue down this "path" of copying code over as needed, and just noting which parts are sklearn and which parts are pure SPORF/morf?

Lmk if the current push here works for you? I didn't change any inheritence in _split.pyx, but you can probably do so now.

adam2392 commented 3 years ago

Some notes on what I was speccing out needs to change outside _oblique_splitter. files:

adam2392 commented 3 years ago

Pattern for allowing arbitrary users to define their own sampling projection: http://conference.scipy.org/proceedings/SciPy2009/paper_1/full_text.pdf page 4