SheffieldML / GPy

Gaussian processes framework in python
BSD 3-Clause "New" or "Revised" License
2.03k stars 561 forks source link

GP Regression retraining #891

Closed appletree999 closed 3 years ago

appletree999 commented 3 years ago

This is probably not an issue, but I have some questions and don't know where to ask.

So, I just started trying GPy. Before I used sklearn, in which you can retrain the GPRegressor by passing in the previous random state. Is there a similar thing you can do in " GPy.models.GPRegression"? I didn't see you have that parameter. https://gpy.readthedocs.io/en/deploy/GPy.models.html?highlight=gpregression#GPy.models.gp_regression.GPRegression

Also, after training, if you want to use the kernel variable in a GPRegressor "gpr", can you just call it like "gpr.kernel" to get the trained kernel?

Also, what are the available optimizers? It doesn't list in the documentation. What is the value of "self.preferred optimizer"? https://gpy.readthedocs.io/en/deploy/GPy.core.html?highlight=Optimize#GPy.core.gp.GP.optimize

Thanks

adamian commented 3 years ago

I am not sure what random_state is doing in sklearn's GPR initializer, but to get consistent behaviour across experiments in GPy you could simply set the random seed in numpy prior to your experiments. i.e.

seed=42
np.random.seed(seed)
m = GPy.models.SparseGPRegression(x,y, num_inducing=10)
m.optimize(max_iters=2, messages=False)
print(m)

will always give you the same result for the same seed, whereas if you omit the first two lines you'll get different results every time you run that snippet.

You can indeed use the instantiated/trained kernel of a model. E.g. continuing the above example you can further do this:

m_new = GPy.models.SparseGPRegression(x,y,num_inducing=10, kernel=m.kern.copy())

to set a new model with the trained kernel from an old model.

Regarding optimizer, from the docstring you can see that: :param optimizer: which optimizer to use (defaults to self.preferred optimizer), a range of optimisers can be found in :module:`~GPy.inference.optimization`, they include 'scg', 'lbfgs', 'tnc'.

and for an instantiated model you can see the preferred optimizer:

In []: m.preferred_optimizer   
Out[]: 'lbfgsb'
appletree999 commented 3 years ago

Hi adamian, thanks for your answer. That clarified the "kernel" question and "m.preferred_optimizer". So, I took a look at "~GPy.inference.optimization" (here https://gpy.readthedocs.io/en/deploy/GPy.inference.optimization.html), it doesn't look like there are any optimizers?

A little more details about the "retraining" question. For example, if you first train the GPR for some rounds and then you analyze the data and decide to keep training. I'd like the process to pick up where it left on the previous training and continue the training (but not repeat the previous training sequence). The "random_state" in sklearn's GPR initializer is for that puspose.

adamian commented 3 years ago

Regarding optimization: The optimization of a model is actually inherited from the paramz package, you can see it here: https://github.com/sods/paramz/blob/master/paramz/model.py

Regarding continuing training: in GPy the state is internal in the instantiated class object. If you call m.optimize() and then m.optimize() again, the second optimization will continue from where the first one left off. Notice that in practice calling m.optimize(max_iters=100); m.optimize(max_iters=100) might not be always exactly equivalent to m.optimize(max_iters=200) depending on the optimizer you choose, e.g. an optimizer using linesearch might have to re-initialize some of its hyper-parameters the second time it's called.

appletree999 commented 3 years ago

Thank you adamian for your answers (the link to the GitHub is also helpful).

adamian commented 3 years ago

Glad to help, I'll close this issue for now if it is resolved.