AlexanderFabisch / gmr

Gaussian Mixture Regression
https://alexanderfabisch.github.io/gmr/
BSD 3-Clause "New" or "Revised" License
168 stars 49 forks source link

Add GMMRegression scikit-learn RegressorMixin #29

Closed mralbu closed 3 years ago

mralbu commented 3 years ago

Adds a scikit-learn RegressorMixin. Addressing #27. Sample usage proposed:

import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_validate

from gmr import GMMRegression

X, y = load_boston(return_X_y=True)

np.random.seed(2)

gmr = GMMRegression(n_components=2)
gmr.fit(X, y)
gmr.score(X, y)
>>  0.7427725113863999

np.random.seed(42)

cross_validate(GMMRegression(n_components=2), X, y)
>> {'fit_time': array([0.00853992, 0.00294971, 0.00394225, 0.00434303, 0.00185752]),
>>  'score_time': array([0.10137773, 0.08251572, 0.07629395, 0.07693887, 0.08095694]),
>>  'test_score': array([       nan, 0.77102495, 0.58159883, 0.0768928 ,        nan])}
AlexanderFabisch commented 3 years ago

Most of my comments have been addressed properly. Three inline comments are still open. In addition, the behavior of the class should be documented here. The code snippet from the top would be perfect.

AlexanderFabisch commented 3 years ago

Just checked the test coverage with nosetests --with-coverage (results are located in cover/index.html). We should add additional tests for the missing branches:


        if X.ndim == 1:
            X = np.expand_dims(X, 1)
        if y.ndim == 1:
            y = np.expand_dims(y, 1)

etc.

AlexanderFabisch commented 3 years ago

Ping @mralbu

My plan is to merge #28 before this one. Do you have time to address my suggestions or should I take over?

mralbu commented 3 years ago

Hi, @AlexanderFabisch

Sorry for taking this long.. I've noticed that the shape checks for X were covered by sklearn.utils.check_X_y. I haven't modified the README.rst instructions, I think you might do a better job than me. I just became aware of the issues with the scikit-learn boston dataset, so maybe another dataset would be a better choice for examples. Feel free to take over additional modifications. Thanks for reviewing and accepting this PR! :)

AlexanderFabisch commented 3 years ago

Sorry for taking this long..

No problem.

I haven't modified the README.rst instructions, I think you might do a better job than me.

OK, I will have it on my list.

I just became aware of the issues with the scikit-learn boston dataset, so maybe another dataset would be a better choice for examples.

There is fetch_california_housing. This dataset should be similar, right?

AlexanderFabisch commented 3 years ago

I merged this pull request to master. Thanks for your contribution @mralbu

I also added an example with sklearn's learning curve: https://github.com/AlexanderFabisch/gmr/blob/master/examples/plot_sklearn_learning_curve.py