Save and load kernel in GPy in a sparse gaussian process regression

nbeuchat commented 7 years ago

Hi!

I hope this is the right place for this question. I have built and optimized a Sparse Gaussian Process Regression model using the GPy library. The documentation recommends to save the model as follow:

To save a model it is best to save the m.param_array of it to disk (using numpy’s np.save). Additionally, you save the script, which creates the model.

I am able to save the parameters of the model and recreate the model from them. However, I need to know in advance the kernel architecture that was used to build the model (defined in the function create_kernel below). To create and save the model, I do the following:

def create_kernel():
    # This function could change
    return GPy.kern.RBF(4,ARD=True) + GPy.kern.White(4)

gp = GPy.models.SparseGPRegression(X, y, 
                                   num_inducing=100,
                                   kernel=create_kernel())

# optimization steps
# ...

# Save the model parameters
np.save('gp_params.npy',gp.param_array)
np.save('gp_y.npy',y)
np.save('gp_X.npy',X_gpr)

To load the model, I am doing the following at the moment. The problem is that I might not have access to the create_kernel function.

# Load model
y_load = np.load('gp_y.npy')
X_load = np.load('gp_X.npy')
gp_load = GPy.models.SparseGPRegression(X_load, y_load, 
                                   initialize=False,
                                   num_inducing=100,
                                   kernel=create_kernel()) # Kernel is problematic here

gp_load.update_model(False)
gp_load.initialize_parameter()
gp_load[:] = np.load('gp_params.npy')
gp_load.update_model(True)

What is the best way to store the kernel for later use? The parameters of the kernel and the inducing inputs are stored in the gp_params.npy file but not the structure of the kernel. At the moment, I have to know which function was used to create the model which will not always be the case.

Thanks a lot for your help! Nicolas

mzwiessele commented 6 years ago

I believe this is being addressed by the new serialization framework mentioned in #547 - still in progress. It is the to_dict and from_dict functions. @zhenwendai is there more to this for now?

blurLake commented 5 years ago

Can this be done for other models, like GPRegression, in the similar manner?

Amir-Arsalan commented 4 years ago

@nbeuchat Were you able to eventually save your model/kernel using any of the new methods such as save_model() or the old-school method you showed here? It seems pretty straight-forward to use the new methods but I have issues using as shown here. I also tried the old-school method of saving the parameters as numpy arrays but get pickling errors but looking at the issues from about 2 years ago this seems not to be the case in the past. Would appreciate if you can help me figure out where I'm making a mistake.

@mzwiessele Would appreciate if you can take a look at my issue and give me a clue on what I might be doing wrong. Thanks!

nbeuchat commented 4 years ago

@Amir-Arsalan I haven't used the new method at all as I haven't used the framework for a while now. However, back in August 2017, I could easily save the parameters as I've shown and I ended up creating a small module for that specific model containing just the create_kernel. Not ideal but that worked for our use-case back then.

SheffieldML / GPy

Save and load kernel in GPy in a sparse gaussian process regression #535