automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.61k stars 1.28k forks source link

Only dummy predictions in custom metric #1639

Open konstantin-doncov opened 1 year ago

konstantin-doncov commented 1 year ago

I want to use my own metric, but I get a lot of troubles during implementing this. Many of them the are related to each other. So, I hope I will solve all of them. E.g. if I use this code with 5 minutes max runtime(time_left_for_this_task=5*60):

def metric_which_needs_x(solution, prediction, X_data):
  print(prediction)
  print(len(X_data))
  return 1

accuracy_scorer = askl.metrics.make_scorer(
    name="accu_X",
    score_func=metric_which_needs_x,
    optimum=1,
    greater_is_better=True,
    needs_proba=True,
    needs_X=True,
    needs_threshold=False
)

automl = askl.classification.AutoSklearnClassifier(
  ensemble_size = 1,
    time_left_for_this_task=5*60,
    per_run_time_limit=5*60,
    metric=accuracy_scorer,
    resampling_strategy=logo,
    resampling_strategy_arguments={"groups": groups}
)
automl.fit(x, y)

Then all fine and my metric function gets real predictions(not 0.5 0.5):

:3: DeprecationWarning: `ensemble_size` has been deprecated, please use `ensemble_kwargs = {'ensemble_size': 1}`. Inserting `ensemble_size` into `ensemble_kwargs` for now. `ensemble_size` will be removed in auto-sklearn 0.16. automl = askl.classification.AutoSklearnClassifier( [WARNING] [2023-01-06 15:18:13,967:Client-AutoML(1):52f808ae-8dd5-11ed-840e-0242ac1c000c] Time limit for a single run is higher than total time limit. Capping the limit for a single run to the total time given to SMAC (294.777947) [WARNING] [2023-01-06 15:18:13,967:Client-AutoML(1):52f808ae-8dd5-11ed-840e-0242ac1c000c] Capping the per_run_time_limit to 147.0 to have time for a least 2 models in each process. [WARNING] [2023-01-06 15:18:14,003:Client-AutoMLSMBO(1)::52f808ae-8dd5-11ed-840e-0242ac1c000c] Could not find meta-data directory /usr/local/lib/python3.8/dist-packages/autosklearn/metalearning/files/accu_X_binary.classification_dense [[0.5 0.5] [0.5 0.5] [0.5 0.5] ... [0.5 0.5] [0.5 0.5] [0.5 0.5]] 227226 [WARNING] [2023-01-06 15:20:20,378:Client-EnsembleBuilder] No runs were available to build an ensemble from [[0.5 0.5] [0.5 0.5] [0.5 0.5] ... [0.5 0.5] [0.5 0.5] [0.5 0.5]] 227226 [[0.5 0.5] [0.5 0.5] [0.5 0.5] ... [0.5 0.5] [0.5 0.5] [0.5 0.5]] 227226 [[0.2602794 0.7397206 ] [0.2947102 0.7052898 ] [0.26641977 0.73358023] ... [0.8857727 0.11422727] [0.83059156 0.16940844] [0.8350615 0.16493851]] 227226 [WARNING] [2023-01-06 15:22:11,886:Client-EnsembleBuilder] No models better than random - using Dummy losses! Models besides current dummy model: 0 Dummy models: 1 [WARNING] [2023-01-06 15:22:11,930:smac.runhistory.runhistory2epm.RunHistory2EPM4LogCost] Got cost of smaller/equal to 0. Replace by 0.000010 since we use log cost. [[0.2602794 0.7397206 ] [0.2947102 0.7052898 ] [0.26641977 0.73358023] ... [0.8857727 0.11422727] [0.83059156 0.16940844] [0.8350615 0.16493851]] 227226 [WARNING] [2023-01-06 15:22:53,545:Client-EnsembleBuilder] No models better than random - using Dummy losses! Models besides current dummy model: 0 Dummy models: 1 [WARNING] [2023-01-06 15:22:53,608:smac.runhistory.runhistory2epm.RunHistory2EPM4LogCost] Got cost of smaller/equal to 0. Replace by 0.000010 since we use log cost.

But if I use 4 minutes max runtime, then I get only dummy predictions(only 0.5 0.5).

You may say 'Well, then just use more time', but this is not a cure. Because when I use more complicated and time consuming (like 1-2 minute for one metric run) metrics, then it's not enough even one hour(and I don't know how much time it takes). So, how can I fix this?

eddiebergman commented 1 year ago

Sorry for the delay, I think the solution here is actually remove it all together. The logs say it only gets to try two models and both are worse than the dummy model, it seems like it needs to try more of them.

It could be that the logo resampling strategy (which I guess is Leave One Group Out) could be creating many subsets of data which just means there's just too much data to fit if the number of groups is too much. Say for example you have 1_000_000 samples with 10 groups. My impression of logo is that you would need to fit 10 models each on 900_000 equaling 9_000_000 data points total to get one model evaluation. This gets amplified more as the number of groups increases. Have you tried simple holdout just to test this hypothesis?