SheffieldML / GPyOpt

Gaussian Process Optimization using GPy
BSD 3-Clause "New" or "Revised" License
927 stars 261 forks source link

Non-numeric categorical variables #161

Open fdtomasi opened 6 years ago

fdtomasi commented 6 years ago

I would like to use non-numerical categorical variables in GPyOpt. From the error, I understand such variables are being treated in the same way as DiscreteVariables, but it is not possible to convert them to float.

Minimal example

from sklearn.svm import SVC
from GPyOpt.methods.bayesian_optimization import BayesianOptimization

domain = [{'name': 'kernel', 'type': 'categorical', 'domain': ('linear', 'poly')}]
BayesianOptimization(SVC(), domain=domain)

Output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-d90c79264e40> in <module>()
      2 
      3 domain = [{'name': 'kernel', 'type': 'categorical', 'domain': ('linear', 'poly')}]
----> 4 BayesianOptimization(SVC(), domain=domain)

/home/fede/src/GPyOpt/GPyOpt/methods/bayesian_optimization.pyc in __init__(self, f, domain, constraints, cost_withGradients, model_type, X, Y, initial_design_numdata, initial_design_type, acquisition_type, normalize_Y, exact_feval, acquisition_optimizer_type, model_update_interval, evaluator_type, batch_size, num_cores, verbosity, verbosity_model, maximize, de_duplication, **kwargs)
    115         self.initial_design_type  = initial_design_type
    116         self.initial_design_numdata = initial_design_numdata
--> 117         self._init_design_chooser()
    118 
    119         # --- CHOOSE the model type. If an instance of a GPyOpt model is passed (possibly user defined), it is used.

/home/fede/src/GPyOpt/GPyOpt/methods/bayesian_optimization.pyc in _init_design_chooser(self)
    189         # Case 1:
    190         if self.X is None:
--> 191             self.X = initial_design(self.initial_design_type, self.space, self.initial_design_numdata)
    192             self.Y, _ = self.objective.evaluate(self.X)
    193         # Case 2

/home/fede/src/GPyOpt/GPyOpt/experiment_design/__init__.pyc in initial_design(design_name, space, init_points_count)
     18         raise ValueError('Unknown design type: ' + design_name)
     19 
---> 20     return design.get_samples(init_points_count)

/home/fede/src/GPyOpt/GPyOpt/experiment_design/random_design.pyc in get_samples(self, init_points_count)
     17             return self.get_samples_with_constraints(init_points_count)
     18         else:
---> 19             return self.get_samples_without_constraints(init_points_count)
     20 
     21     def get_samples_with_constraints(self, init_points_count):

/home/fede/src/GPyOpt/GPyOpt/experiment_design/random_design.pyc in get_samples_without_constraints(self, init_points_count)
     57         samples = np.empty((init_points_count, self.space.dimensionality))
     58 
---> 59         self.fill_noncontinous_variables(samples)
     60 
     61         if self.space.has_continuous():

/home/fede/src/GPyOpt/GPyOpt/experiment_design/random_design.pyc in fill_noncontinous_variables(self, samples)
     44             if isinstance(var, DiscreteVariable) or isinstance(var, CategoricalVariable) :
     45                 sample_var = np.atleast_2d(np.random.choice(var.domain, init_points_count))
---> 46                 samples[:,idx] = sample_var.flatten()
     47 
     48             # sample in the case of bandit variables

ValueError: could not convert string to float: linear
joshring commented 6 years ago

Make a wrapper function to internally call SVC, and encode inside the function numbers corresponding to the attributes you care about.

This way the optimisation method need not need to know the details of the arguments to optimise the function

hmanz commented 6 years ago

I'm having the same issue when trying to use categorical variables with strings in their domain. @joshring If I understood correctly you are suggesting to write a wrapper function in order to be able to use integers (like a discrete variable) instead of strings.

This brings me to my next question: Are discrete variables fitted with a GP? Or are they assumed to be categorical with no local correlation? Fyi this question I'm asking is similar to the one in this link, but nobody responded to that.

ahundt commented 6 years ago

This might do what you are looking for: https://github.com/SheffieldML/GPyOpt/issues/175