automl / SMAC3

SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization
https://automl.github.io/SMAC3/v2.2.0/
Other
1.09k stars 226 forks source link

NotImplementedError: Wrong number or type of arguments for overloaded function 'binary_rss_forest_predict_marginalized_over_instances_batch' #928

Open davidmurray opened 1 year ago

davidmurray commented 1 year ago

Description

Goal: Run SMAC to find best hyperparameters

Steps/Code to Reproduce

    cs = ConfigurationSpace()
    cs.add_hyperparameters([
        OrdinalHyperparameter("npop", list(range(20, 500+1, 20)), default_value=100),
        Float("cxpb", (0.5, 1.0)),
        Float("mutpb", (0.01, 0.1)),
        OrdinalHyperparameter("n_elites_prop", list(np.arange(0.00, 0.15, 0.02)), default_value=0.04),
        Integer("tourn_size", (2, 10)),
        OrdinalHyperparameter("constraint_penalty", sequence=[50000, 100000, 150000, 200000, 250000, 300000], default_value=100000),
        Categorical("mate", ["probabilisticGeneCrossover", "cxOnePoint", "cxTwoPoint"])
    ])

    zones = [str(v) for v in range(1, 8)] # 7 zones
    scenario = Scenario(cs,
        walltime_limit=167.5*60*60, 
        deterministic=False,
        instances=zones,
        instance_features={v:[v] for v in zones},
        n_trials=5000)

    smac_facade = HyperparameterOptimizationFacade(scenario, _smac_run)
    return smac_facade

The _smac_run function performs the simulation for this config and returns the cost (i.e. value to minimize).

Expected Results

No crash

Actual Results

Traceback (most recent call last):
  File "/home/dmurray/masters-code/osm/algo/genetic_algorithm.py", line 444, in <module>
    incumbent = smac_facade.optimize()
  File "/home/dmurray/SMAC3/smac/facade/abstract_facade.py", line 289, in optimize
    incumbents = self._optimizer.optimize()
  File "/home/dmurray/SMAC3/smac/main/smbo.py", line 279, in optimize
    trial_info = self.ask()
  File "/home/dmurray/SMAC3/smac/main/smbo.py", line 151, in ask
    trial_info = next(self._trial_generator)
  File "/home/dmurray/SMAC3/smac/intensifier/intensifier.py", line 222, in __iter__
    config = next(self.config_generator)
  File "/home/dmurray/SMAC3/smac/main/config_selector.py", line 199, in __iter__
    x_best_array, best_observation = self._get_x_best(X_configurations)
  File "/home/dmurray/SMAC3/smac/main/config_selector.py", line 324, in _get_x_best
    costs = list(
  File "/home/dmurray/SMAC3/smac/main/config_selector.py", line 327, in <lambda>
    model.predict_marginalized(x.reshape((1, -1)))[0][0][0],  # type: ignore
  File "/home/dmurray/SMAC3/smac/model/random_forest/random_forest.py", line 278, in predict_marginalized
    dat_ = self._rf.predict_marginalized_over_instances_batch(X, X_feat, self._log_y)
  File "/home/dmurray/masters-code/lib/python3.10/site-packages/pyrfr/regression.py", line 2680, in predict_marginalized_over_instances_batch
    return _regression.binary_rss_forest_predict_marginalized_over_instances_batch(self, configuration_matrix, feature_matrix, log_y)
NotImplementedError: Wrong number or type of arguments for overloaded function 'binary_rss_forest_predict_marginalized_over_instances_batch'.
  Possible C/C++ prototypes are:
    rfr::forests::regression_forest< binary_full_tree_rss_t,num_t,response_t,index_t,rng_t >::predict_marginalized_over_instances_batch(std::vector< std::vector< num_t,std::allocator< num_t > >,std::allocator< std::vector< num_t,std::allocator< num_t > > > > const,std::vector< std::vector< num_t,std::allocator< num_t > >,std::allocator< std::vector< num_t,std::allocator< num_t > > > > const,bool const) const
    rfr::forests::regression_forest< binary_full_tree_rss_t,num_t,response_t,index_t,rng_t >::predict_marginalized_over_instances_batch(std::vector< std::vector< num_t,std::allocator< num_t > >,std::allocator< std::vector< num_t,std::allocator< num_t > > > > const,std::vector< std::vector< num_t,std::allocator< num_t > >,std::allocator< std::vector< num_t,std::allocator< num_t > > > > const) const

Versions

SMAC 2.0.0b1

davidmurray commented 1 year ago

This was with pyrfr v0.9.0. I tried with v0.8.3. I got a similar but different error:

Traceback (most recent call last):
  File "/home/dmurray/masters-code/osm/algo/genetic_algorithm.py", line 444, in <module>
    incumbent = smac_facade.optimize()
  File "/home/dmurray/SMAC3/smac/facade/abstract_facade.py", line 289, in optimize
    incumbents = self._optimizer.optimize()
  File "/home/dmurray/SMAC3/smac/main/smbo.py", line 279, in optimize
    trial_info = self.ask()
  File "/home/dmurray/SMAC3/smac/main/smbo.py", line 151, in ask
    trial_info = next(self._trial_generator)
  File "/home/dmurray/SMAC3/smac/intensifier/intensifier.py", line 222, in __iter__
    config = next(self.config_generator)
  File "/home/dmurray/SMAC3/smac/main/config_selector.py", line 199, in __iter__
    x_best_array, best_observation = self._get_x_best(X_configurations)
  File "/home/dmurray/SMAC3/smac/main/config_selector.py", line 324, in _get_x_best
    costs = list(
  File "/home/dmurray/SMAC3/smac/main/config_selector.py", line 327, in <lambda>
    model.predict_marginalized(x.reshape((1, -1)))[0][0][0],  # type: ignore
  File "/home/dmurray/SMAC3/smac/model/random_forest/random_forest.py", line 278, in predict_marginalized
    dat_ = self._rf.predict_marginalized_over_instances_batch(X, X_feat, self._log_y)
AttributeError: 'binary_rss_forest' object has no attribute 'predict_marginalized_over_instances_batch'
renesass commented 1 year ago

Please use the version of the development branch. The team is pushing hard to release v2.0.0b2 asap, which solves this problem.

Sorry for the inconvenience.

davidmurray commented 1 year ago

Hi @renesass,

Sorry, I made a mistake in my original message. I was actually running the latest version on the development branch, not 2.0.0b1.

I have found the source of the issue. Due to https://github.com/automl/SMAC3/issues/927, I must encode instance features as strings rather than integers. However, predict_marginalized_over_instances_batch expects X_feat to contain a numeric type and thus crashes when it sees a string.

So what can I do? It seems like issue #927 requires that I set instance features as strings but this issue requires that I set instance features as a numeric type.

Best regards, David

davidmurray commented 1 year ago

And also, @renesass : What is the purpose of the instance features ? In my case, instances are just different geographical zones to test the algorithm on. There is no such thing as the mean (or other statistical measure) for those instances. They are separate problems.

In such a case, would it be problematic for the surrogate model if I did not include instance features? Does setting instance features as just the index of the instance (1, 2, 3, ...) have any value for the surrogate model?

Thanks, David

renesass commented 1 year ago

Hey David,

you can indeed leave out the instance features. Please read https://automl.github.io/SMAC3/v2.0.0b1/advanced_usage/1_components.html#surrogate-model how instance features influence the results. And you are right, the instance feature values need to be integers - it doesn't make sense otherwise anyways.

davidmurray commented 1 year ago

Hey David,

you can indeed leave out the instance features. Please read https://automl.github.io/SMAC3/v2.0.0b1/advanced_usage/1_components.html#surrogate-model how instance features influence the results. And you are right, the instance feature values need to be integers - it doesn't make sense otherwise anyways.

Hi @renesass Thanks for the documentation link. I hadn't seen it earlier. Now I understand better.

So in my case, you think it's fine for me to safely ignore this warning?

Message: 'We strongly encourage to use instance features when using instances.'
Arguments: ('If no instance features are passed, the runhistory encoder can not distinguish between different instances and therefore returns the same data points with different values, all of which are used to train the surrogate model.\nConsider using instance indices as features.',)