automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.62k stars 1.28k forks source link

Is there another method of finding the best algorithm other than leaderboard #1442

Open asmgx opened 2 years ago

asmgx commented 2 years ago

Using leaderboard() has been giving me hard time and giving errors ( I created a ticket for them)

Is there any other way to find the best algorithm in my model?

P.S. using AutoSklearn version 0.14.6

eddiebergman commented 2 years ago

You can use show_models but it's likely to give you similar errors. You can use these three lines to get the models with their weights. But be a bit safer about it.

Fyi, the issue hasn't been fixed but it at least won't give an error. It's in development but I'm not sure how long until a release is made.

Best, Eddie

eddiebergman commented 2 years ago

Here's the fixed version for reference if you need https://github.com/automl/auto-sklearn/blob/development/autosklearn/estimators.py#L938

asmgx commented 2 years ago

Shall I update this file (estimators.py#L938 manually with the code you provided? will that fix the issue

eddiebergman commented 2 years ago

You can do that if you like, or, instead of calling def leaderboard() you can use the same logic as the code provided to get out what you need.

asmgx commented 2 years ago

still getting error


Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/estimators.py", line 764, in leaderboard
    for rkey, rval in self.automl_.runhistory_.data.items()
AttributeError: 'NoneType' object has no attribute 'runhistory_'
>>>
eddiebergman commented 2 years ago

You'll have to read and understand the code. This seems like is a general python error and not auto-sklearn specific. I'm not sure if you modified the actual estimators file but you don't need to

You can do something like:

# Your file where you are using Autosklearn
my_classifier = AutoSklearnClassifier(...)
my_classifier.fit(X, y)

runhistory = my_classifier.automl_.runhistory_

# ... do stuff
eddiebergman commented 2 years ago

Closing due to inactivity and the latest error is not autosklearn specific.

asmgx commented 2 years ago

Sorry for the delay, i have been trying to get the bottom of this

This method still does not show an algorithm that has been used.

This is the output I got from clf.automl_.runhistory_.data.items()


odict_items([
(RunKey(config_id=1, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=32.70566391944885, status=<StatusType.CRASHED: 3>, starttime=1650754177.2785182, endtime=1650754210.0102727, additional_info={'traceback': 'Traceback (most recent call last):\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/__init__.py", line 42, in fit_predict_try_except_decorator\n    return ta(queue=queue, **kwargs)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1408, in eval_iterative_cv\n    eval_cv(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1383, in eval_cv\n    evaluator.fit_predict_and_loss(iterative=iterative)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 329, in fit_predict_and_loss\n    model.iterative_fit(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/base.py", line 127, in iterative_fit\n    self._final_estimator.iterative_fit(X, y, n_iter=n_iter,\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/__init__.py", line 149, in iterative_fit\n    return self.choice.iterative_fit(X, y, n_iter=n_iter, **fit_params)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/gradient_boosting.py", line 125, in iterative_fit\n    self.estimator.fit(X, y, sample_weight=sample_weight)\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py", line 270, in fit\n    sample_weight_val) = train_test_split(\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 2197, in train_test_split\n    train, test = next(cv.split(X=arrays[0], y=stratify))\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1387, in split\n    for train, test in self._iter_indices(X, y, groups):\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1715, in _iter_indices\n    raise ValueError("The least populated class in y has only 1"\nValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.\n', 'error': "ValueError('The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.')", 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=2, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=0.0024999999999999914, time=59.02157926559448, status=<StatusType.DONOTADVANCE: 7>, starttime=1650754177.2943952, endtime=1650754237.3593912, additional_info={'duration': 53.925941705703735, 'num_run': 3, 'train_loss': 0.0, 'subprocess_stdout': 'The least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\n', 'info': 'Run stopped because of timeout.', 'learning_curve': [0.0024999999999999914, 0.0024999999999999914, 0.0024999999999999914], 'learning_curve_runtime': [43.16014623641968, 48.51548194885254, 53.925941705703735], 'train_learning_curve': [0.0, 0.00031250000000000445, 0.0], 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=3, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=59.029137134552, status=<StatusType.TIMEOUT: 2>, starttime=1650754177.3097827, endtime=1650754237.377532, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=4, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=0.013749999999999997, time=59.02944850921631, status=<StatusType.DONOTADVANCE: 7>, starttime=1650754177.332573, endtime=1650754237.6695342, additional_info={'duration': 53.8226203918457, 'num_run': 5, 'train_loss': 0.0015625000000000222, 'subprocess_stdout': 'The least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\n', 'info': 'Run stopped because of timeout.', 'learning_curve': [0.01874999999999998, 0.013749999999999997], 'learning_curve_runtime': [41.331947803497314, 53.8226203918457], 'train_learning_curve': [0.007812500000000023, 0.0015625000000000222], 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=5, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=23.6397864818573, status=<StatusType.CRASHED: 3>, starttime=1650754177.6396298, endtime=1650754201.3307548, additional_info={'traceback': 'Traceback (most recent call last):\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/__init__.py", line 42, in fit_predict_try_except_decorator\n    return ta(queue=queue, **kwargs)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1408, in eval_iterative_cv\n    eval_cv(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1383, in eval_cv\n    evaluator.fit_predict_and_loss(iterative=iterative)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 329, in fit_predict_and_loss\n    model.iterative_fit(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/base.py", line 127, in iterative_fit\n    self._final_estimator.iterative_fit(X, y, n_iter=n_iter,\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/__init__.py", line 149, in iterative_fit\n    return self.choice.iterative_fit(X, y, n_iter=n_iter, **fit_params)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/mlp.py", line 144, in iterative_fit\n    self.estimator.fit(X, y)\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 673, in fit\n    return self._fit(X, y, incremental=False)\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 399, in _fit\n    self._fit_stochastic(X, y, activations, deltas, coef_grads,\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 525, in _fit_stochastic\n    X, X_val, y, y_val = train_test_split(\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 2197, in train_test_split\n    train, test = next(cv.split(X=arrays[0], y=stratify))\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1387, in split\n    for train, test in self._iter_indices(X, y, groups):\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1715, in _iter_indices\n    raise ValueError("The least populated class in y has only 1"\nValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.\n', 'error': "ValueError('The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.')", 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=6, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=0.010000000000000009, time=59.06597280502319, status=<StatusType.DONOTADVANCE: 7>, starttime=1650754177.6630983, endtime=1650754237.8171875, additional_info={'duration': 47.305580377578735, 'num_run': 7, 'train_loss': 0.005312500000000009, 'subprocess_stdout': 'The least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\n', 'info': 'Run stopped because of timeout.', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=7, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=59.04720687866211, status=<StatusType.TIMEOUT: 2>, starttime=1650754177.6820188, endtime=1650754237.8177464, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=8, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=0.037499999999999985, time=59.08307766914368, status=<StatusType.DONOTADVANCE: 7>, starttime=1650754177.7040098, endtime=1650754237.8338203, additional_info={'duration': 57.0889732837677, 'num_run': 9, 'train_loss': 0.0353125, 'subprocess_stdout': 'The least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\n', 'info': 'Run stopped because of timeout.', 'learning_curve': [0.0425, 0.05124999999999999, 0.037499999999999985], 'learning_curve_runtime': [34.96041703224182, 45.19434571266174, 57.0889732837677], 'train_learning_curve': [0.03656249999999999, 0.04406250000000001, 0.0353125], 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=9, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=0.0024999999999999914, time=46.72268342971802, status=<StatusType.SUCCESS: 1>, starttime=1650754177.74943, endtime=1650754224.496036, additional_info={'duration': 46.21944880485535, 'num_run': 10, 'train_loss': 0.0, 'learning_curve': [0.003749999999999987, 0.0024999999999999914, 0.0024999999999999914, 0.0024999999999999914, 0.0024999999999999914], 'learning_curve_runtime': [22.935990571975708, 29.194799423217773, 34.853703022003174, 40.19394636154175, 46.21944880485535], 'train_learning_curve': [0.0006250000000000089, 0.0, 0.0, 0.0, 0.0], 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=10, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=16.99842596054077, status=<StatusType.CRASHED: 3>, starttime=1650754177.88176, endtime=1650754194.9761496, additional_info={'traceback': 'Traceback (most recent call last):\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/__init__.py", line 42, in fit_predict_try_except_decorator\n    return ta(queue=queue, **kwargs)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1408, in eval_iterative_cv\n    eval_cv(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1383, in eval_cv\n    evaluator.fit_predict_and_loss(iterative=iterative)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 329, in fit_predict_and_loss\n    model.iterative_fit(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/base.py", line 127, in iterative_fit\n    self._final_estimator.iterative_fit(X, y, n_iter=n_iter,\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/__init__.py", line 149, in iterative_fit\n    return self.choice.iterative_fit(X, y, n_iter=n_iter, **fit_params)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/gradient_boosting.py", line 125, in iterative_fit\n    self.estimator.fit(X, y, sample_weight=sample_weight)\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py", line 270, in fit\n    sample_weight_val) = train_test_split(\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 2197, in train_test_split\n    train, test = next(cv.split(X=arrays[0], y=stratify))\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1387, in split\n    for train, test in self._iter_indices(X, y, groups):\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1715, in _iter_indices\n    raise ValueError("The least populated class in y has only 1"\nValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.\n', 'error': "ValueError('The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.')", 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=11, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=0.0024999999999999914, time=51.11537718772888, status=<StatusType.SUCCESS: 1>, starttime=1650754177.971318, endtime=1650754229.162319, additional_info={'duration': 50.44809103012085, 'num_run': 12, 'train_loss': 0.002500000000000036, 'learning_curve': [0.0024999999999999914, 0.0024999999999999914, 0.0024999999999999914, 0.0024999999999999914, 0.0024999999999999914, 0.0024999999999999914], 'learning_curve_runtime': [16.837047815322876, 22.513781547546387, 29.39050579071045, 35.171164751052856, 41.45954966545105, 50.44809103012085], 'train_learning_curve': [0.002500000000000036, 0.002500000000000036, 0.002500000000000036, 0.002500000000000036, 0.002500000000000036, 0.002500000000000036], 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=12, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=29.34589171409607, status=<StatusType.CRASHED: 3>, starttime=1650754178.4252083, endtime=1650754207.8164613, additional_info={'traceback': 'Traceback (most recent call last):\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/__init__.py", line 42, in fit_predict_try_except_decorator\n    return ta(queue=queue, **kwargs)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1408, in eval_iterative_cv\n    eval_cv(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1383, in eval_cv\n    evaluator.fit_predict_and_loss(iterative=iterative)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 329, in fit_predict_and_loss\n    model.iterative_fit(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/base.py", line 127, in iterative_fit\n    self._final_estimator.iterative_fit(X, y, n_iter=n_iter,\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/__init__.py", line 149, in iterative_fit\n    return self.choice.iterative_fit(X, y, n_iter=n_iter, **fit_params)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/gradient_boosting.py", line 125, in iterative_fit\n    self.estimator.fit(X, y, sample_weight=sample_weight)\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py", line 270, in fit\n    sample_weight_val) = train_test_split(\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 2197, in train_test_split\n    train, test = next(cv.split(X=arrays[0], y=stratify))\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1387, in split\n    for train, test in self._iter_indices(X, y, groups):\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1715, in _iter_indices\n    raise ValueError("The least populated class in y has only 1"\nValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.\n', 'error': "ValueError('The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.')", 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=13, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=16.431341409683228, status=<StatusType.CRASHED: 3>, starttime=1650754195.1183324, endtime=1650754211.6001222, additional_info={'traceback': 'Traceback (most recent call last):\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/__init__.py", line 42, in fit_predict_try_except_decorator\n    return ta(queue=queue, **kwargs)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1408, in eval_iterative_cv\n    eval_cv(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1383, in eval_cv\n    evaluator.fit_predict_and_loss(iterative=iterative)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 329, in fit_predict_and_loss\n    model.iterative_fit(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/base.py", line 127, in iterative_fit\n    self._final_estimator.iterative_fit(X, y, n_iter=n_iter,\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/__init__.py", line 149, in iterative_fit\n    return self.choice.iterative_fit(X, y, n_iter=n_iter, **fit_params)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/gradient_boosting.py", line 125, in iterative_fit\n    self.estimator.fit(X, y, sample_weight=sample_weight)\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py", line 270, in fit\n    sample_weight_val) = train_test_split(\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 2197, in train_test_split\n    train, test = next(cv.split(X=arrays[0], y=stratify))\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1387, in split\n    for train, test in self._iter_indices(X, y, groups):\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1715, in _iter_indices\n    raise ValueError("The least populated class in y has only 1"\nValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.\n', 'error': "ValueError('The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.')", 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=14, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=59.04702353477478, status=<StatusType.TIMEOUT: 2>, starttime=1650754201.3973906, endtime=1650754261.4712338, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=15, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=59.046088457107544, status=<StatusType.TIMEOUT: 2>, starttime=1650754207.8768454, endtime=1650754267.9492505, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=16, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=0.0024999999999999914, time=56.79266715049744, status=<StatusType.SUCCESS: 1>, starttime=1650754210.2605114, endtime=1650754267.0842903, additional_info={'duration': 56.431211709976196, 'num_run': 17, 'train_loss': 0.0, 'learning_curve': [0.003749999999999987, 0.003749999999999987, 0.003749999999999987, 0.0024999999999999914, 0.0024999999999999914], 'learning_curve_runtime': [34.58225774765015, 39.83625030517578, 45.36500835418701, 50.46578550338745, 56.431211709976196], 'train_learning_curve': [0.0009374999999999912, 0.0, 0.0, 0.0, 0.0], 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=17, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=59.04437851905823, status=<StatusType.TIMEOUT: 2>, starttime=1650754211.654649, endtime=1650754271.7208927, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=18, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=59.05082106590271, status=<StatusType.TIMEOUT: 2>, starttime=1650754224.535896, endtime=1650754284.617626, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=19, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=0.9975, time=55.06376552581787, status=<StatusType.DONOTADVANCE: 7>, starttime=1650754229.2518373, endtime=1650754285.3876128, additional_info={'duration': 48.31077289581299, 'num_run': 20, 'train_loss': 0.9975, 'subprocess_stdout': 'The least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\n', 'subprocess_stderr': '/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n', 'info': 'Run stopped because of timeout.', 'learning_curve': [0.9975, 0.9975, 0.9975, 0.9975], 'learning_curve_runtime': [23.58888077735901, 31.481527090072632, 39.974218130111694, 48.31077289581299], 'train_learning_curve': [0.9975, 0.9975, 0.9975, 0.9975], 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=20, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=47.03621745109558, status=<StatusType.TIMEOUT: 2>, starttime=1650754237.4181778, endtime=1650754285.515705, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=21, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=0.12, time=47.034883975982666, status=<StatusType.DONOTADVANCE: 7>, starttime=1650754237.569879, endtime=1650754285.6265893, additional_info={'duration': 46.12961411476135, 'num_run': 22, 'train_loss': 0.002812499999999996, 'subprocess_stdout': 'The least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\n', 'subprocess_stderr': '/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n', 'info': 'Run stopped because of timeout.', 'learning_curve': [0.25125, 0.20875000000000005, 0.19375, 0.15625000000000003, 0.12], 'learning_curve_runtime': [22.080286502838135, 27.89290499687195, 33.33868098258972, 38.730738162994385, 46.12961411476135], 'train_learning_curve': [0.15468750000000003, 0.07156250000000001, 0.02312500000000002, 0.00812499999999996, 0.002812499999999996], 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=22, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=20.62538194656372, status=<StatusType.CRASHED: 3>, starttime=1650754237.697609, endtime=1650754258.3523226, additional_info={'traceback': 'Traceback (most recent call last):\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/__init__.py", line 42, in fit_predict_try_except_decorator\n    return ta(queue=queue, **kwargs)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1408, in eval_iterative_cv\n    eval_cv(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 1383, in eval_cv\n    evaluator.fit_predict_and_loss(iterative=iterative)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/evaluation/train_evaluator.py", line 329, in fit_predict_and_loss\n    model.iterative_fit(\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/base.py", line 127, in iterative_fit\n    self._final_estimator.iterative_fit(X, y, n_iter=n_iter,\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/__init__.py", line 149, in iterative_fit\n    return self.choice.iterative_fit(X, y, n_iter=n_iter, **fit_params)\n  File "/home/my/anaconda3/lib/python3.8/site-packages/autosklearn/pipeline/components/classification/gradient_boosting.py", line 125, in iterative_fit\n    self.estimator.fit(X, y, sample_weight=sample_weight)\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py", line 270, in fit\n    sample_weight_val) = train_test_split(\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 2197, in train_test_split\n    train, test = next(cv.split(X=arrays[0], y=stratify))\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1387, in split\n    for train, test in self._iter_indices(X, y, groups):\n  File "/home/my/.local/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 1715, in _iter_indices\n    raise ValueError("The least populated class in y has only 1"\nValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.\n', 'error': "ValueError('The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.')", 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=23, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=47.07687187194824, status=<StatusType.TIMEOUT: 2>, starttime=1650754238.030686, endtime=1650754286.1327865, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=24, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=46.08737850189209, status=<StatusType.TIMEOUT: 2>, starttime=1650754238.504264, endtime=1650754285.610624, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=25, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=0.115, time=46.0707311630249, status=<StatusType.DONOTADVANCE: 7>, starttime=1650754238.620648, endtime=1650754285.7274203, additional_info={'duration': 43.283092737197876, 'num_run': 26, 'train_loss': 0.1109375, 'subprocess_stdout': 'The least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\nThe least populated class in y has only 2 members, which is less than n_splits=5.\n', 'subprocess_stderr': '/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n/home/my/.local/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n  warnings.warn("Maximum number of iteration reached before "\n', 'info': 'Run stopped because of timeout.', 'learning_curve': [0.31375000000000003, 0.315, 0.22, 0.115], 'learning_curve_runtime': [21.78280019760132, 27.698118925094604, 34.59027099609375, 43.283092737197876], 'train_learning_curve': [0.390625, 0.37312500000000004, 0.27812500000000007, 0.1109375], 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=26, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=26.051013231277466, status=<StatusType.TIMEOUT: 2>, starttime=1650754258.3928027, endtime=1650754285.4660196, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=27, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=23.03747320175171, status=<StatusType.TIMEOUT: 2>, starttime=1650754261.5237162, endtime=1650754285.5842803, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=28, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=18.074486017227173, status=<StatusType.TIMEOUT: 2>, starttime=1650754267.144992, endtime=1650754286.260646, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=29, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=17.031450510025024, status=<StatusType.TIMEOUT: 2>, starttime=1650754267.999331, endtime=1650754286.0448165, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=9, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=25.0), RunValue(cost=1.0, time=13.043877124786377, status=<StatusType.TIMEOUT: 2>, starttime=1650754271.7868109, endtime=1650754285.8542476, additional_info={'error': 'Timeout', 'configuration_origin': 'Initial design'})), 
(RunKey(config_id=30, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=0.0, status=<StatusType.STOP: 8>, starttime=1650754284.6560502, endtime=1650754284.6560507, additional_info={})), 
(RunKey(config_id=31, instance_id='{"task_id": "9a9e1fbd-c357-11ec-a24f-192ff4cebbef"}', seed=0, budget=6.25), RunValue(cost=1.0, time=0.0, status=<StatusType.STOP: 8>, starttime=1650754284.7680821, endtime=1650754284.7680821, additional_info={}))])
>>>
eddiebergman commented 2 years ago

So this issue is a known #1190 one and was (hopefully) fixed in development a while ago.

raise ValueError("The least populated class in y has only 1"

If you are able to, could you try the development branch?

Essentially when you have only one instance of a class, the default sklearn splitter doesn't know how to handle it when trying to split into train and test.

asmgx commented 2 years ago

I tried the development code that was provided in this link https://github.com/automl/auto-sklearn/pull/1250/files#diff-e0f50e611431c0f3f5a735ebea3a582b6271ea02bf758de9a82a6c46dfa3a36cR60

but still getting the same error