Closed bailuding closed 4 years ago
I am getting the same error with a similar setup.
Could you please upload the full log file?
i am getting the same error with config, and i used my own code to call the classifier.
when run it lonely, it runs well.
AutoSklearnClassifier(delete_output_folder_after_terminate=False,
delete_tmp_folder_after_terminate=False,
disable_evaluator_output=False, ensemble_memory_limit=4096,
ensemble_nbest=3, ensemble_size=10, exclude_estimators=None,
exclude_preprocessors=None, get_smac_object_callback=None,
include_estimators=None, include_preprocessors=None,
initial_configurations_via_metalearning=25, logging_config=None,
ml_memory_limit=10240, n_jobs=10, output_folder=None,
per_run_time_limit=30, resampling_strategy='cv',
resampling_strategy_arguments={'folds': 5}, seed=1,
shared_mode=False, smac_scenario_args=None,
time_left_for_this_task=120, tmp_folder=None)
@mfeurer @bailuding
I know what happen about this. The reason of KeyError
is not the configuration of your classifier, it's about your dataset.
s1: when you runed smac.optimize()
, it will call incumbent = self.solver.run()
to get the current incumbent
.
s2: Then, it will check the incumbent
. In self.solver.run()
, it will do self.start()
at first at xxx/lib/python3.6/site-packages/smac/optimizer/smbo.py
, the detail of code is
self.stats.start_timing()
# Initialization, depends on input
print(self.stats.ta_runs, self.incumbent)
if self.stats.ta_runs == 0 and self.incumbent is None:
try:
self.incumbent = self.initial_design.run()
print("="*100)
print(self.incumbent)
except FirstRunCrashedException as err:
print("=" * 100)
print(err)
print("j" * 200)
if self.scenario.abort_on_first_run_crash:
raise
elif self.stats.ta_runs > 0 and self.incumbent is None:
raise ValueError("According to stats there have been runs performed, "
"but the optimizer cannot detect an incumbent. Did "
"you set the incumbent (e.g. after restoring state)?")
elif self.stats.ta_runs == 0 and self.incumbent is not None:
raise ValueError("An incumbent is specified, but there are no runs "
"recorded in the Stats-object. If you're restoring "
"a state, please provide the Stats-object.")
else:
# Restoring state!
self.logger.info("State Restored! Starting optimization with "
"incumbent %s", self.incumbent)
self.logger.info("State restored with following budget:")
self.stats.print_stats()
the print()
can print debug info of start()
.
because the config abort_on_first_run_crash
, you wouldn't see the error of self.incumbent = self.initial_design.run()
. In my code, the error info is
ValueError('No feature in X meets the variance threshold 0.00000',)
The reason of error is that I did the binning in my preprocessing, so it led to low variance. However, you cannot see the error info in your terminal or log.
so, print the stats info in smac/optimizer/smbo.py
, and check the exception please.
I'm also having the same problem. It's very difficult to understand the problem, because it depends on the model that we set in. For example, in my experiments, knn worked quite well, but random forest not:
Traceback (most recent call last):
File "script_autosklearn_dublin.py", line 123, in
Besides that, when I increase the sample data, the problem appears with other models too.
I tried to investigate this with the latest version of Auto-sklearn (0.7.0) but failed to produce a case where this happens. If this issue still exists with 0.7.0 please let us know, otherwise I hope that using the latest version of SMAC actually fixed this.
Closing this as there seems to be no such issue with the latest release of SMAC which is used by Auto-sklearn >= 0.7.0.
I consistently get KeyError exception when running AutoSklearnClassifier on openML dataset 258 (did 258).
The error is similar to Issue #456 , but the difference is that (1) I use all the classifiers and (2) the exception only happens when I set the running time to be sufficiently long. For example, exception happens when time_left_for_this_task = 1200 and per_run_time_limit = 120, but it runs fine with time_left_for_this_task = 600 and per_run_time_limit = 60.
It is also worth noting that I get a lot of warnings about '[WARNING] [2019-01-05 01:59:19,621:EnsembleBuilder(1):63cfe65a70e3c23913ca224abee4a84c] No models better than random - using Dummy Score!'. I am not sure if this is relevant.
Here is the configuration of the AutoSklearnClassifier I use:
automl = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=1200, per_run_time_limit=120, tmp_folder=$tmp_folder, output_folder=$output_folder, delete_tmp_folder_after_terminate=False, resampling_strategy = 'holdout', ensemble_memory_limit = 50000, ml_memory_limit = 50000, )
And here is the exception:
<class 'KeyError'> None (None,) Traceback (most recent call last): File "/home/user/test.py", line 102, in run automl.fit(X_train.copy(), y_train.copy()) File "/home/user/env/lib/python3.5/site-packages/autosklearn/estimators.py", line 500, in fit dataset_name=dataset_name, File "/home/user/env/lib/python3.5/site-packages/autosklearn/estimators.py", line 267, in fit self._automl.fit(*args, **kwargs) File "/home/user/env/lib/python3.5/site-packages/autosklearn/automl.py", line 965, in fit only_return_configuration_space=only_return_configuration_space, File "/home/user/env/lib/python3.5/site-packages/autosklearn/automl.py", line 203, in fit only_return_configuration_space, File "/home/user/env/lib/python3.5/site-packages/autosklearn/automl.py", line 468, in _fit _proc_smac.run_smbo() File "/home/user/env/lib/python3.5/site-packages/autosklearn/smbo.py", line 501, in run_smbo smac.optimize() File "/home/user/env/lib/python3.5/site-packages/smac/facade/smac_facade.py", line 400, in optimize incumbent = self.solver.run() File "/home/user/env/lib/python3.5/site-packages/smac/optimizer/smbo.py", line 180, in run challengers = self.choose_next(X, Y) File "/home/user/env/lib/python3.5/site-packages/smac/optimizer/smbo.py", line 247, in choose_next incumbent_value = self.runhistory.get_cost(self.incumbent) File "/home/user/env/lib/python3.5/site-packages/smac/runhistory/runhistory.py", line 271, in get_cost config_id = self.config_ids[config] KeyError