Closed smilesun closed 5 years ago
if I change refit to fit_ensemble, i get the following error
[WARNING] [2019-02-19 17:57:37,088:EnsembleBuilder(1):digits] Error loading /tmp/autosklearn_tmp_8001_9848/.auto-sklearn/predictions_ensemble/predictions_ensemble_1_5.npy: Traceback (
most recent call last):
File "/home/sunxd/anaconda3/lib/python3.6/site-packages/autosklearn/ensemble_builder.py", line 321, in read_ensemble_preds
all_scoring_functions=False)
File "/home/sunxd/anaconda3/lib/python3.6/site-packages/autosklearn/metrics/__init__.py", line 262, in calculate_score
if task_type not in TASK_TYPES:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
@mfeurer , is it ok to first call fit_ensemble with ensemble_size = 1 then call refit as follows? it works at least
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
import autosklearn.classification
X, y = sklearn.datasets.load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = \
sklearn.model_selection.train_test_split(X, y, random_state=1)
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=50000,
per_run_time_limit=30,
#tmp_folder='/tmp/autosklearn_cv_example_tmp',
#output_folder='/tmp/autosklearn_cv_example_out',
#delete_tmp_folder_after_terminate=False,
resampling_strategy='cv',
initial_configurations_via_metalearning=0,
ensemble_size = 0,
smac_scenario_args={'runcount_limit': 5},
resampling_strategy_arguments={'folds': 5}
)
# fit() changes the data in place, but refit needs the original data. We
# therefore copy the data. In practice, one should reload the data
automl.fit(X_train.copy(), y_train.copy(), dataset_name='digits')
# During fit(), models are fit on individual cross-validation folds. To use
# all available data, we call refit() which trains all models in the
# final ensemble on the whole dataset.
auml.fit_ensemble(y_train.copy(), ensemble_size = 1)
automl.refit(X_train.copy(), y_train.copy())
print(automl.show_models())
predictions = automl.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
Sorry for the slow response.
Yes, your latest example is the correct way to go. What happens is that during fit() you don't build an ensemble, therefore, refit() cannot be applied to anything. By calling fit_ensemble() you build an ensemble, which can then be refit on the full training data (including the validation set which was split off during the hyperparameter optimization process).
@mfeurer So what is the difference between refit() and fit_ensemble()? Do you mean refit() use train only (as default holdout, 67% of the split) and test against validation( the other 33%)? If run as the example in @smilesun last post, call fit(), fit_ensemble(), refit(), does it actually takes 3 times of the time limit (3*3600 sec as default)?
Continuing on #451, since
time_left_for_this_task
is not a very sensible budget in our application scenario due to differences of hardware and working load, we decide to useruncount_limit
, following the example belowwe get the following error: