Closed AlexanderFabisch closed 3 years ago
@mralbu would be great if you could take a look at this PR.
Will take a look as soon as I can!
I checked a similar code snippet as in #27 on the proposed fix, running beforehand the following RegressorMixin class definition:
from sklearn.base import BaseEstimator, RegressorMixin, MultiOutputMixin
from sklearn.utils import check_X_y
from sklearn.utils.validation import check_is_fitted, check_array, FLOAT_DTYPES
from gmr import GMM
class GaussianMixtureRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
def __init__(self, n_components, priors=None, means=None, covariances=None,
verbose=0, random_state=None, R_diff=1e-4, n_iter=500, init_params="random", oracle_approximating_shrinkage=False):
self.n_components = n_components
self.priors = priors
self.means = means
self.covariances = covariances
self.verbose = verbose
self.random_state = random_state
self.R_diff = R_diff
self.n_iter = n_iter
self.init_params = init_params
self.oracle_approximating_shrinkage = oracle_approximating_shrinkage
def fit(self, X, y):
self.gmm_ = GMM(self.n_components, priors=self.priors, means=self.means,
covariances=self.covariances, verbose=self.verbose, random_state=self.random_state)
X, y = check_X_y(X, y, estimator=self.gmm_, dtype=FLOAT_DTYPES, multi_output=True)
if y.ndim == 1:
y = np.expand_dims(y, 1)
self.indices_ = np.arange(X.shape[1])
self.gmm_.from_samples(np.hstack((X, y)),
R_diff=self.R_diff, n_iter=self.n_iter, init_params=self.init_params, oracle_approximating_shrinkage=self.oracle_approximating_shrinkage)
return self
def predict(self, X):
check_is_fitted(self, ["gmm_", "indices_"])
X = check_array(X, estimator=self.gmm_, dtype=FLOAT_DTYPES)
return self.gmm_.predict(self.indices_, X)
Code snippet as in #27:
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_validate
X, y = load_boston(return_X_y=True)
np.random.seed(42)
cross_validate(GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=False), X, y)
>> {'fit_time': array([0.02808738, 0.00212717, 0.00335884, 0.00367188, 0.00159121]),
>> 'score_time': array([0.06866217, 0.06366682, 0.06335163, 0.06228876, 0.06565619]),
>> 'test_score': array([-1.77181999e+03, 7.71024953e-01, 5.81598835e-01, 7.68927960e-02, -1.41784818e+02])}
np.random.seed(42)
cross_validate(GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=True), X, y)
>> {'fit_time': array([0.01825643, 0.0023396 , 0.03778887, 0.012429 , 0.00279951]),
>> 'score_time': array([0.06537795, 0.05936241, 0.05965519, 0.06038404, 0.0568521 ]),
>> 'test_score': array([0.32767343, 0.5339876 , 0.27381149, 0.40139363, 0.38142808])}
def is_pos_def(x):
return np.all(np.linalg.eigvals(x) >= 0)
np.random.seed(42)
gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=False)
gmr.fit(X, y)
print([is_pos_def(sigma) for sigma in gmr.gmm_.covariances])
>> [False, True]
np.random.seed(42)
gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=True)
gmr.fit(X, y)
print([is_pos_def(sigma) for sigma in gmr.gmm_.covariances])
>> [True, True]
With the oracle_approximating_shrinkage algorithm, all the fitted covariances were positive semi-definite.
Cross-validation on the evaluated dataset did not show np.nan
or negative numbers as well.
I noticed as well that on the folds that converged in the example with oracle_approximating_shrinkage=False, the R2 metrics were different ([0.771024953, 0.581598835, 0.0768927960] for oracle_approximating_shrinkage=False and [0.5339876 , 0.27381149, 0.40139363] for oracle_approximating_shrinkage=True), but maybe this is expected given that the algorithm is stochastic, right?
That happens as well in the following snippet.
np.set_printoptions(precision=2)
np.random.seed(2)
scores = []
for _ in range(10):
gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=True)
gmr.fit(X, y)
scores.append(gmr.score(X, y))
print(np.array(scores))
>> [0.62 0.62 0.64 0.64 0.64 0.64 0.59 0.64 0.62 0.64]
np.random.seed(2)
scores = []
for _ in range(10):
gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=False)
gmr.fit(X, y)
scores.append(gmr.score(X, y))
print(np.array(scores))
>> [ 0.74 0.74 0.75 -3.98 0.85 -14.96 0.69 0.74 0.83 0.74]
It seems that, when it converges, oracle_approximating_shrinkage=False, might lead to better R2 metrics.
Thank you for your analysis.
I noticed as well that on the folds that converged in the example with oracle_approximating_shrinkage=False, the R2 metrics were different ([0.771024953, 0.581598835, 0.0768927960] for oracle_approximating_shrinkage=False and [0.5339876 , 0.27381149, 0.40139363] for oracle_approximating_shrinkage=True), but maybe this is expected given that the algorithm is stochastic, right?
Yes, of course the initialization of MVNs is stochastic, but even if it wasn't I would expect similar results. I think shrinkage of the covariance has an effect similar to regularization in linear regression (not the same, although that would be interesting to investigate...).
Did some additional comparisons with an experiment using sklearn.mixture.GaussianMixture machinery for fitting the regressor.
from sklego.mixture import GMMRegressor
np.set_printoptions(precision=4)
np.random.seed(2)
scores = []
for _ in range(10):
gmr = GMMRegressor(n_components=2)
gmr.fit(X, y)
scores.append(gmr.score(X, y))
print(np.array(scores))
>> [0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478]
np.random.seed(2)
scores = []
for _ in range(10):
gmr = GMMRegressor(n_components=2, init_params='random', max_iter=100)
gmr.fit(X, y)
scores.append(gmr.score(X, y))
print(np.array(scores))
>> [0.8157 0.8061 0.8221 0.8152 0.8221 0.8192 0.8479 0.8282 0.8251 0.7792]
Maybe using internal sklearn.mixture machinery might help ease numerical issues, though it would introduce sklearn as a hard dependency and might be out of scope. On the other hand, it would enable the introduction of other regressors such as BayesianGMMRegressor in an easy way, and would have familiar parameters (the same as in sklearn.mixture.GaussianMixture). Do you think exploring the use of sklearn.mixture inner workings would be interesting for gmr?
Do you think exploring the use of sklearn.mixture inner workings would be interesting for gmr?
Yes, definitely.
I wouldn't like to introduce sklearn as a required dependency though. You can easily initialize a GMM object from the priors, means, and covariances of a sklearn class, which should make the implementation of BayesianGMMRegressor easy. There are actually some examples that use sklearn's Bayesian GMM.
I opened a new issue here: https://github.com/AlexanderFabisch/gmr/issues/30