automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.57k stars 1.28k forks source link

KeyError exception when running AutoSklearnClassifier #608

Closed bailuding closed 4 years ago

bailuding commented 5 years ago

I consistently get KeyError exception when running AutoSklearnClassifier on openML dataset 258 (did 258).

The error is similar to Issue #456 , but the difference is that (1) I use all the classifiers and (2) the exception only happens when I set the running time to be sufficiently long. For example, exception happens when time_left_for_this_task = 1200 and per_run_time_limit = 120, but it runs fine with time_left_for_this_task = 600 and per_run_time_limit = 60.

It is also worth noting that I get a lot of warnings about '[WARNING] [2019-01-05 01:59:19,621:EnsembleBuilder(1):63cfe65a70e3c23913ca224abee4a84c] No models better than random - using Dummy Score!'. I am not sure if this is relevant.

Here is the configuration of the AutoSklearnClassifier I use:

automl = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=1200, per_run_time_limit=120, tmp_folder=$tmp_folder, output_folder=$output_folder, delete_tmp_folder_after_terminate=False, resampling_strategy = 'holdout', ensemble_memory_limit = 50000, ml_memory_limit = 50000, )

And here is the exception:

<class 'KeyError'> None (None,) Traceback (most recent call last): File "/home/user/test.py", line 102, in run automl.fit(X_train.copy(), y_train.copy()) File "/home/user/env/lib/python3.5/site-packages/autosklearn/estimators.py", line 500, in fit dataset_name=dataset_name, File "/home/user/env/lib/python3.5/site-packages/autosklearn/estimators.py", line 267, in fit self._automl.fit(*args, **kwargs) File "/home/user/env/lib/python3.5/site-packages/autosklearn/automl.py", line 965, in fit only_return_configuration_space=only_return_configuration_space, File "/home/user/env/lib/python3.5/site-packages/autosklearn/automl.py", line 203, in fit only_return_configuration_space, File "/home/user/env/lib/python3.5/site-packages/autosklearn/automl.py", line 468, in _fit _proc_smac.run_smbo() File "/home/user/env/lib/python3.5/site-packages/autosklearn/smbo.py", line 501, in run_smbo smac.optimize() File "/home/user/env/lib/python3.5/site-packages/smac/facade/smac_facade.py", line 400, in optimize incumbent = self.solver.run() File "/home/user/env/lib/python3.5/site-packages/smac/optimizer/smbo.py", line 180, in run challengers = self.choose_next(X, Y) File "/home/user/env/lib/python3.5/site-packages/smac/optimizer/smbo.py", line 247, in choose_next incumbent_value = self.runhistory.get_cost(self.incumbent) File "/home/user/env/lib/python3.5/site-packages/smac/runhistory/runhistory.py", line 271, in get_cost config_id = self.config_ids[config] KeyError

howlinghuffy commented 5 years ago

I am getting the same error with a similar setup.

mfeurer commented 5 years ago

Could you please upload the full log file?

lai-bluejay commented 5 years ago

i am getting the same error with config, and i used my own code to call the classifier.

when run it lonely, it runs well.

AutoSklearnClassifier(delete_output_folder_after_terminate=False,
           delete_tmp_folder_after_terminate=False,
           disable_evaluator_output=False, ensemble_memory_limit=4096,
           ensemble_nbest=3, ensemble_size=10, exclude_estimators=None,
           exclude_preprocessors=None, get_smac_object_callback=None,
           include_estimators=None, include_preprocessors=None,
           initial_configurations_via_metalearning=25, logging_config=None,
           ml_memory_limit=10240, n_jobs=10, output_folder=None,
           per_run_time_limit=30, resampling_strategy='cv',
           resampling_strategy_arguments={'folds': 5}, seed=1,
           shared_mode=False, smac_scenario_args=None,
           time_left_for_this_task=120, tmp_folder=None)
lai-bluejay commented 5 years ago

@mfeurer @bailuding I know what happen about this. The reason of KeyError is not the configuration of your classifier, it's about your dataset.

s1: when you runed smac.optimize(), it will call incumbent = self.solver.run() to get the current incumbent.

s2: Then, it will check the incumbent. In self.solver.run(), it will do self.start() at first at xxx/lib/python3.6/site-packages/smac/optimizer/smbo.py, the detail of code is

self.stats.start_timing()
        # Initialization, depends on input
        print(self.stats.ta_runs, self.incumbent)
        if self.stats.ta_runs == 0 and self.incumbent is None:
            try:
                self.incumbent = self.initial_design.run()
                print("="*100)
                print(self.incumbent)
            except FirstRunCrashedException as err:
                print("=" * 100)
                print(err)
                print("j" * 200)
                if self.scenario.abort_on_first_run_crash:
                    raise
        elif self.stats.ta_runs > 0 and self.incumbent is None:
            raise ValueError("According to stats there have been runs performed, "
                             "but the optimizer cannot detect an incumbent. Did "
                             "you set the incumbent (e.g. after restoring state)?")
        elif self.stats.ta_runs == 0 and self.incumbent is not None:
            raise ValueError("An incumbent is specified, but there are no runs "
                             "recorded in the Stats-object. If you're restoring "
                             "a state, please provide the Stats-object.")
        else:
            # Restoring state!
            self.logger.info("State Restored! Starting optimization with "
                             "incumbent %s", self.incumbent)
            self.logger.info("State restored with following budget:")
            self.stats.print_stats()

the print() can print debug info of start().

because the config abort_on_first_run_crash, you wouldn't see the error of self.incumbent = self.initial_design.run(). In my code, the error info is

ValueError('No feature in X meets the variance threshold 0.00000',)

The reason of error is that I did the binning in my preprocessing, so it led to low variance. However, you cannot see the error info in your terminal or log.

so, print the stats info in smac/optimizer/smbo.py, and check the exception please.

michaeloc commented 4 years ago

I'm also having the same problem. It's very difficult to understand the problem, because it depends on the model that we set in. For example, in my experiments, knn worked quite well, but random forest not:

Traceback (most recent call last): File "script_autosklearn_dublin.py", line 123, in automl.fit(X_train_final, y_train_final) File "/home/mobility/anaconda3/envs/michael_environment2/lib/python3.7/site-packages/autosklearn/estimators.py", line 664, in fit dataset_name=dataset_name, File "/home/mobility/anaconda3/envs/michael_environment2/lib/python3.7/site-packages/autosklearn/estimators.py", line 399, in fit load_models=True, File "/home/mobility/anaconda3/envs/michael_environment2/lib/python3.7/site-packages/autosklearn/estimators.py", line 15, in _fit_automl return automl.fit(load_models=load_models, **kwargs) File "/home/mobility/anaconda3/envs/michael_environment2/lib/python3.7/site-packages/autosklearn/automl.py", line 996, in fit load_models=load_models, File "/home/mobility/anaconda3/envs/michael_environment2/lib/python3.7/site-packages/autosklearn/automl.py", line 208, in fit only_return_configuration_space=only_return_configuration_space, File "/home/mobility/anaconda3/envs/michael_environment2/lib/python3.7/site-packages/autosklearn/automl.py", line 489, in _fit _proc_smac.run_smbo() File "/home/mobility/anaconda3/envs/michael_environment2/lib/python3.7/site-packages/autosklearn/smbo.py", line 504, in run_smbo smac.optimize() File "/home/mobility/anaconda3/envs/michael_environment2/lib/python3.7/site-packages/smac/facade/smac_facade.py", line 400, in optimize incumbent = self.solver.run() File "/home/mobility/anaconda3/envs/michael_environment2/lib/python3.7/site-packages/smac/optimizer/smbo.py", line 180, in run challengers = self.choose_next(X, Y) File "/home/mobility/anaconda3/envs/michael_environment2/lib/python3.7/site-packages/smac/optimizer/smbo.py", line 247, in choose_next incumbent_value = self.runhistory.get_cost(self.incumbent) File "/home/mobility/anaconda3/envs/michael_environment2/lib/python3.7/site-packages/smac/runhistory/runhistory.py", line 271, in get_cost config_id = self.config_ids[config] KeyError: None

Besides that, when I increase the sample data, the problem appears with other models too.

mfeurer commented 4 years ago

I tried to investigate this with the latest version of Auto-sklearn (0.7.0) but failed to produce a case where this happens. If this issue still exists with 0.7.0 please let us know, otherwise I hope that using the latest version of SMAC actually fixed this.

mfeurer commented 4 years ago

Closing this as there seems to be no such issue with the latest release of SMAC which is used by Auto-sklearn >= 0.7.0.