SheffieldML / GPyOpt

Gaussian Process Optimization using GPy
BSD 3-Clause "New" or "Revised" License
929 stars 261 forks source link

ValueError when saving model with ARD #125

Open ghost opened 7 years ago

ghost commented 7 years ago

Hi! When saving an ARD=True model using the 'models_file' argument, I get a ValueError that reads: ValueError: Wrong number of items passed 20, placement implies 4

This doesn't happen when ARD=False. My script is:

bo = GPyOpt.methods.BayesianOptimization(
    bb.group_eval,   
    domain=bounds,
    initial_design_numdata=5,
    model_type='GP',
    acquisition_type='MPI',
    normalize_Y=False,
    exact_feval=False, 
    ARD=True)

bo.run_optimization(
    n_iterations, 
    max_time,
    models_file='test.txt')

Maybe the saving function is not keeping into account the larger number of parameters when ARD=True.

javiergonzalezh commented 7 years ago

yap, it seems like that is the issue. Do you mind having a look to it and make a PR? Should be just a check on the dimensions when saving the results.

ghost commented 7 years ago

Hi Javier! It may be trickier than that. In bo.py we have these two lines (373, 374)

header  = ['Iteration'] + self.model.get_model_parameters_names()
df_results = pd.DataFrame(results, columns = header)

The issue is in the header as it does not contain the correct number of parameter names. self.model.get_model_parameters_names() is calling a method of the GPy object. The method returns: ['Mat52.variance', 'Mat52.lengthscale', 'Gaussian_noise.variance'] but in my case the ARD model has 19 parameters. I think the problem should be fixed at the level of the GPy library, rather finding a workaround in GPyOpt. What do you think?

apaleyes commented 7 years ago

Can you please post the whole output of the error, with the stack trace? That would clearly indicate where the error is coming from.

ghost commented 7 years ago

Here is the output, but as I said the error actually comes from self.model.get_model_parameters_names().

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4616                 blocks = [make_block(values=blocks[0],
-> 4617                                      placement=slice(0, len(axes[0])))]
   4618 

~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
   2951 
-> 2952     return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
   2953 

~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, values, placement, ndim, fastpath)
    119                              'implies %d' % (len(self.values),
--> 120                                              len(self.mgr_locs)))
    121 

ValueError: Wrong number of items passed 20, placement implies 4

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-14-bd3ed2b27530> in <module>()
      8     eps=tolerance,
      9     context={},
---> 10     models_file='test_save')
     11 
     12 #save_models_parameters=True,

~/Articles/qahm_gateqc/code/GPyOpt/GPyOpt/core/bo.py in run_optimization(self, max_iter, max_time, eps, context, verbosity, save_models_parameters, report_file, evaluations_file, models_file)
    157             self.save_evaluations(self.evaluations_file)
    158         if self.models_file is not None:
--> 159             self.save_models(self.models_file)
    160 
    161 

~/Articles/qahm_gateqc/code/GPyOpt/GPyOpt/core/bo.py in save_models(self, models_file)
    372 
    373         header  = ['Iteration'] + self.model.get_model_parameters_names()
--> 374         df_results = pd.DataFrame(results,columns = header)
    375         df_results.to_csv(models_file,index=False, sep='\t')

~/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    359             else:
    360                 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 361                                          copy=copy)
    362         elif isinstance(data, (list, types.GeneratorType)):
    363             if isinstance(data, types.GeneratorType):

~/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _init_ndarray(self, values, index, columns, dtype, copy)
    531             values = maybe_infer_to_datetimelike(values)
    532 
--> 533         return create_block_manager_from_blocks([values], [columns, index])
    534 
    535     @property

~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4624         blocks = [getattr(b, 'values', b) for b in blocks]
   4625         tot_items = sum(b.shape[0] for b in blocks)
-> 4626         construction_error(tot_items, blocks[0].shape[1:], axes, e)
   4627 
   4628 

~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4601         raise ValueError("Empty data passed with indices specified.")
   4602     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4603         passed, implied))
   4604 
   4605 

ValueError: Shape of passed values is (20, 3), indices imply (4, 3)
alansaul commented 7 years ago

Here is a minimal example:

import GPy
from GPyOpt.methods import BayesianOptimization
import numpy as np
def f(x): return (6*x[:,0]-2)**2*np.sin(12*x[:,1]-4)
bounds = [{'name': 'var_1', 'type': 'continuous', 'domain': (0,1)}, {'name': 'var_2', 'type': 'continuous', 'domain': (0,1)}]
k = GPy.kern.Matern52(input_dim=2, ARD=True)
from GPyOpt.models import GPModel
m = GPModel(kernel=k)
myBopt = BayesianOptimization(f=f, domain=bounds,  model= m)
myBopt.run_optimization(max_iter=15)
myBopt.save_models()

I think @mozerfazer is correct in his diagnosis. The issue is that the parameters are collected and saved by GPyOpt by getting the underlying GPy models param_array. The param_array is an array of all the models parameters. For models with ARD a subset of this array will be the lengthscales of the model, unfortunately these are collected under one model parameter name 'Mat52.lengthscale'.

It doesn't appear like it will be specific to ARD models, just any model parameters that are stored as a vector.

print(m.model)

makes it clear what the issue is - you are currently unpacking 2 value of the lengthscale under one column name 'Mat52.lengthscale'.

I'm not familiar with how the models are loaded in GPyOpt, but I believe simply replacing GPModel's function as below would help:

    def get_model_parameters_names(self):
        """
        Returns a list with the names of the parameters of the model
        """
        return self.model.parameter_names_flat()

I can make a PR if that is helpful.

apaleyes commented 7 years ago

@alansaul sounds about right. Please go for it!

javiergonzalezh commented 7 years ago

Thanks @alansaul for noticing this! As @apaleyes says, a PR on this would be great! :)