cjekel / piecewise_linear_fit_py

fit piecewise linear data for a specified number of line segments
MIT License
300 stars 60 forks source link

Scikit Regressor Integration #55

Open mschmill opened 4 years ago

mschmill commented 4 years ago

I would be great if this package were to integrate in with scikit-learn, identify itself as a regressor, and implement the associated APIs. This would unlock scikit workflows as well as packages that build on scikit, like Yellowbrick. Overall the package is great, this would be greater!

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py#L372

cjekel commented 4 years ago

Thanks! I agree that getting pwlf 100% compatible with scikit-learn would be a nice long term goal. I think I want to merge the multi-dimensional support first https://github.com/cjekel/piecewise_linear_fit_py/issues/52

The major difference right now between the api's is that currently pwlf copies the x and y data on initialization, while scikit-learn would only need this during a .fit(x, y). (And the option of n_segments is backwards...) There was good reason for this distinction when the library first came out, but that's no longer the case.

I'd probably want to create a scikit-learn style class, without breaking backwards compatibility of the existing class.

cerlymarco commented 3 years ago

This can be done automatically by Linear Trees.

Linear Trees differ from Decision Trees because they compute linear approximation (instead of constant ones) fitting simple Linear Models in the leaves.

A sklearn compatible implementation is available here

joshdunnlime commented 1 year ago

@cerlymarco Kind of...

Linear Trees, from my understanding, (amazing work nonetheless) don't honour the continuously-smooth constraint that MVPWLR offers. This is important in many contexts, not least because it will reduce overfitting. I'm not sure there is a way with Linear Trees that can guarentee this constraint though I hope I am wrong :)

Of course, there are many applications where this isn't necissary. There are also applications where this lack is desirable (steps in functions).