Open ghost opened 7 years ago
yap, it seems like that is the issue. Do you mind having a look to it and make a PR? Should be just a check on the dimensions when saving the results.
Hi Javier! It may be trickier than that. In bo.py
we have these two lines (373, 374)
header = ['Iteration'] + self.model.get_model_parameters_names()
df_results = pd.DataFrame(results, columns = header)
The issue is in the header as it does not contain the correct number of parameter names. self.model.get_model_parameters_names()
is calling a method of the GPy object.
The method returns:
['Mat52.variance', 'Mat52.lengthscale', 'Gaussian_noise.variance']
but in my case the ARD model has 19 parameters. I think the problem should be fixed at the level of the GPy library, rather finding a workaround in GPyOpt.
What do you think?
Can you please post the whole output of the error, with the stack trace? That would clearly indicate where the error is coming from.
Here is the output, but as I said the error actually comes from self.model.get_model_parameters_names()
.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
4616 blocks = [make_block(values=blocks[0],
-> 4617 placement=slice(0, len(axes[0])))]
4618
~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
2951
-> 2952 return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
2953
~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, values, placement, ndim, fastpath)
119 'implies %d' % (len(self.values),
--> 120 len(self.mgr_locs)))
121
ValueError: Wrong number of items passed 20, placement implies 4
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-14-bd3ed2b27530> in <module>()
8 eps=tolerance,
9 context={},
---> 10 models_file='test_save')
11
12 #save_models_parameters=True,
~/Articles/qahm_gateqc/code/GPyOpt/GPyOpt/core/bo.py in run_optimization(self, max_iter, max_time, eps, context, verbosity, save_models_parameters, report_file, evaluations_file, models_file)
157 self.save_evaluations(self.evaluations_file)
158 if self.models_file is not None:
--> 159 self.save_models(self.models_file)
160
161
~/Articles/qahm_gateqc/code/GPyOpt/GPyOpt/core/bo.py in save_models(self, models_file)
372
373 header = ['Iteration'] + self.model.get_model_parameters_names()
--> 374 df_results = pd.DataFrame(results,columns = header)
375 df_results.to_csv(models_file,index=False, sep='\t')
~/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
359 else:
360 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 361 copy=copy)
362 elif isinstance(data, (list, types.GeneratorType)):
363 if isinstance(data, types.GeneratorType):
~/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _init_ndarray(self, values, index, columns, dtype, copy)
531 values = maybe_infer_to_datetimelike(values)
532
--> 533 return create_block_manager_from_blocks([values], [columns, index])
534
535 @property
~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
4624 blocks = [getattr(b, 'values', b) for b in blocks]
4625 tot_items = sum(b.shape[0] for b in blocks)
-> 4626 construction_error(tot_items, blocks[0].shape[1:], axes, e)
4627
4628
~/anaconda/lib/python3.6/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
4601 raise ValueError("Empty data passed with indices specified.")
4602 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4603 passed, implied))
4604
4605
ValueError: Shape of passed values is (20, 3), indices imply (4, 3)
Here is a minimal example:
import GPy
from GPyOpt.methods import BayesianOptimization
import numpy as np
def f(x): return (6*x[:,0]-2)**2*np.sin(12*x[:,1]-4)
bounds = [{'name': 'var_1', 'type': 'continuous', 'domain': (0,1)}, {'name': 'var_2', 'type': 'continuous', 'domain': (0,1)}]
k = GPy.kern.Matern52(input_dim=2, ARD=True)
from GPyOpt.models import GPModel
m = GPModel(kernel=k)
myBopt = BayesianOptimization(f=f, domain=bounds, model= m)
myBopt.run_optimization(max_iter=15)
myBopt.save_models()
I think @mozerfazer is correct in his diagnosis. The issue is that the parameters are collected and saved by GPyOpt by getting the underlying GPy models param_array. The param_array is an array of all the models parameters. For models with ARD a subset of this array will be the lengthscales of the model, unfortunately these are collected under one model parameter name 'Mat52.lengthscale'.
It doesn't appear like it will be specific to ARD models, just any model parameters that are stored as a vector.
print(m.model)
makes it clear what the issue is - you are currently unpacking 2 value of the lengthscale under one column name 'Mat52.lengthscale'.
I'm not familiar with how the models are loaded in GPyOpt, but I believe simply replacing GPModel's function as below would help:
def get_model_parameters_names(self):
"""
Returns a list with the names of the parameters of the model
"""
return self.model.parameter_names_flat()
I can make a PR if that is helpful.
@alansaul sounds about right. Please go for it!
Thanks @alansaul for noticing this! As @apaleyes says, a PR on this would be great! :)
Hi! When saving an ARD=True model using the 'models_file' argument, I get a ValueError that reads:
ValueError: Wrong number of items passed 20, placement implies 4
This doesn't happen when ARD=False. My script is:
Maybe the saving function is not keeping into account the larger number of parameters when ARD=True.