Getting out of memory from Dummy prediction no matter how much memory is allocated.

Hello,

I am running auto-sklearn on a Google Cloud machine in Jupyter. I keep getting the following out of memory error no matter how much memory I assigned to ml_memory_limit. The following is the error message I am getting:

ValueError: Dummy prediction failed with run state StatusType.MEMOUT and additional output: {'error': 'Memout (used more than 5000 MB).', 'configuration_origin': 'DUMMY'}.

The following is my initialization code:

import autosklearn.classification

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=300,
    per_run_time_limit=30,
    ml_memory_limit=5000,
    ensemble_size=0,
    include_preprocessors=["no_preprocessing"])

automl.fit(X_train.values, y_train.index.values)

The X_train has 400K rows with 5 columns of data. The y_train is a vector with 400K rows of data. I am using auto-sklearn==0.10.0. I have been adjusting the ml_memory_limit beyond 5000 MB but the program returned pretty quickly with the same error. The ml_memory_limit doesn't seem to be honored. I have tried the suggestions in issue#520 but to no avail.

I tried to run the following example in the Jupyter notebook to make sure I am using the library correctly:

import autosklearn.classification
import sklearn.datasets
import sklearn.metrics

X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = \
    sklearn.model_selection.train_test_split(X, y, random_state=1)

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120,
    per_run_time_limit=30
)
automl.fit(X_train, y_train, dataset_name='breast_cancer')

It finished training successfully.

I would appreciate any help from the community!

Environment:

Python version: 3.7.8
Scikit-learn version: 0.22.2.post1
OS: Debian 9
auto-sklearn: 0.10.0

It turned out that I didn't assign enough memory to it. The ml_memory_limit does work. Even after I increased the memory limit to > 600GB, I still could not make it very far until I get the error again.

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=3600,
    per_run_time_limit=1900,
    ml_memory_limit=600*1024,
    ensemble_size=1,
    ensemble_memory_limit=7*1024,
    initial_configurations_via_metalearning=0,
    include_preprocessors=["no_preprocessing"],
    tmp_folder='./tmp/',
    output_folder='./out/',
    delete_output_folder_after_terminate=False,
    delete_tmp_folder_after_terminate=False,

I am surprised that auto-sklearn consumes so much memory for 400K rows of data. A single XGBoost instance can finish training pretty quickly on a medium machine. I can see the value of auto-sklearn. But, it is discouraging that it requires so much memory for not so large dataset.

I would like to give it another try if someone can point out how I can save some memory or if I am doing something wrong.

Hi @shihgianlee thanks a lot for reporting this issue. I'm really unsure why this happens as 6GB for 400k instances sounds sufficient.

Two steps to move forward:

unfortunately, we currently do the dummy prediction in a subprocess which copies the memory state of the main process. That might be the issue here. Unfortunately, we currently cannot get around this.
could you please try the other way round by subsampling your dataset to see whether at some point it works?

Also, out of curiosity, how many attributes does your dataset have?

Hi @mfeurer If I remembered correctly, I subsampled 5K rows of data and used 10 GB memory. It didn't throw memory error but was taking a long time to complete. I gave up waiting after an hour, if I remembered correctly. I only have 5 attributes.

Hello @mfeurer. I am facing the same issue while tunning autosklearn on kaggle. the dataset is only 2.2 GB. About 400k rows as well, but only 4 columns. Locally I have seen sklearn handle bigger datasets with less memory. Dont know if this is a cloud related issue.

Hi @shihgianlee , @ach4l,

Sorry it's been a while, but to clarify, it seems these issues only happen in cloud based infrastructure like GCP and Kaggle? Do these issues also happen locally?

While we don't test on cloud infrastructure beyond unittesting on Github's actions, it would be interesting to find out what the root cause of these memory issues is.

Hi @shihgianlee , @ach4l, @shihgianlee I faced a very similar issue. Here are the deatils:

Dataset:

Size: 197,9 MB
Columns: 89
Rows: 501808

Init params:

automl = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=3600,
    per_run_time_limit=360,
    memory_limit=27000
)

df_cv_results

    mean_test_score  mean_fit_time                                             params  rank_test_scores   status  budgets  ... param_regressor:libsvm_svr:gamma param_regressor:mlp:validation_fraction param_regressor:sgd:epsilon param_regressor:sgd:eta0 param_regressor:sgd:l1_ratio param_regressor:sgd:power_t
1          0.001069     206.981234  {'data_preprocessing:categorical_transformer:c...                 1  Success      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
12         0.000141      21.207443  {'data_preprocessing:categorical_transformer:c...                 2  Success      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
7          0.000014      12.729235  {'data_preprocessing:categorical_transformer:c...                 3  Success      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
0          0.000000     360.100346  {'data_preprocessing:categorical_transformer:c...                 4  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
15         0.000000       9.028285  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
25         0.000000       4.793600  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
24         0.000000       6.720857  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
23         0.000000     360.019049  {'data_preprocessing:categorical_transformer:c...                 4  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
22         0.000000      31.379792  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
21         0.000000      16.599984  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     0.1                         NaN                      NaN                          NaN                         NaN
20         0.000000     360.116118  {'data_preprocessing:categorical_transformer:c...                 4  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
19         0.000000       9.361809  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
18         0.000000       5.814345  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
17         0.000000     360.115730  {'data_preprocessing:categorical_transformer:c...                 4  Timeout      0.0  ...                         0.032332                                     NaN                         NaN                      NaN                          NaN                         NaN
16         0.000000       6.615313  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
14         0.000000     360.080400  {'data_preprocessing:categorical_transformer:c...                 4  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
13         0.000000     360.043842  {'data_preprocessing:categorical_transformer:c...                 4  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
11         0.000000       6.372612  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
9          0.000000      19.444851  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
8          0.000000       8.804391  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
6          0.000000     360.117929  {'data_preprocessing:categorical_transformer:c...                 4  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
5          0.000000       8.347663  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
3          0.000000       5.497032  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
2          0.000000       5.126574  {'data_preprocessing:categorical_transformer:c...                 4   Memout      0.0  ...                         0.002623                                     NaN                         NaN                      NaN                          NaN                         NaN
27         0.000000     217.114209  {'data_preprocessing:categorical_transformer:c...                 4  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
4         -0.002910      62.450895  {'data_preprocessing:categorical_transformer:c...                26  Success      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
26        -0.007268      20.792192  {'data_preprocessing:categorical_transformer:c...                27  Success      0.0  ...                              NaN                                     NaN                    0.000047                      NaN                     0.018917                         NaN
10      -456.128305     257.670972  {'data_preprocessing:categorical_transformer:c...                28  Success      0.0  ...                              NaN                                     0.1                         NaN                      NaN                          NaN                         NaN

automl.leaderboard

          rank  ensemble_weight               type        cost    duration  config_id  train_loss  seed    start_time      end_time  budget              status                                 data_preprocessors                 feature_preprocessors balancing_strategy           config_origin
model_id                                                                                                                                                                                                                                                                                          
3            1             0.68  gradient_boosting    0.998931  206.981234          2    0.950685     0  1.631515e+09  1.631515e+09     0.0  StatusType.SUCCESS           [one_hot_encoding, no_coalescense, none]             [select_rates_regression]               None          Initial design
14           2             0.32  gradient_boosting    0.999859   21.207443         13    0.999518     0  1.631516e+09  1.631516e+09     0.0  StatusType.SUCCESS  [one_hot_encoding, minority_coalescer, robust_...                    [no_preprocessing]               None          Initial design
9            3             0.00  gradient_boosting    0.999986   12.729235          8    0.999978     0  1.631516e+09  1.631516e+09     0.0  StatusType.SUCCESS     [one_hot_encoding, minority_coalescer, minmax]             [select_rates_regression]               None          Initial design
6            4             0.00  gradient_boosting    1.002910   62.450895          5    0.964131     0  1.631515e+09  1.631515e+09     0.0  StatusType.SUCCESS       [no_encoding, no_coalescense, robust_scaler]               [feature_agglomeration]               None          Initial design
28           5             0.00                sgd    1.007268   20.792192         27    1.007495     0  1.631518e+09  1.631518e+09     0.0  StatusType.SUCCESS           [one_hot_encoding, no_coalescense, none]             [select_rates_regression]               None  Random Search (sorted)
12           6             0.00                mlp  457.128305  257.670972         11    0.981957     0  1.631516e+09  1.631516e+09     0.0  StatusType.SUCCESS    [one_hot_encoding, no_coalescense, standardize]  [extra_trees_preproc_for_regression]               None          Initial design

My system and versions:

This machine runs in a VirtualBox. Host: Windows Guest: Linux

auto-sklearn = "==0.13.0"
python_version = "3.7"

$ uname -a

Linux i 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ lscpu

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          14
On-line CPU(s) list:             0-13
Thread(s) per core:              1
Core(s) per socket:              14
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           158
Model name:                      Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
Stepping:                        13
CPU MHz:                         3600.006
BogoMIPS:                        7200.01
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       448 KiB
L1i cache:                       448 KiB
L2 cache:                        3,5 MiB
L3 cache:                        224 MiB

I see lots of MEMOUTS while the memory_limit is 27000. I'm I doing something wrong?

I have a Linux laptop as well so I will try the same run on linux without any kind of virtualizations and post my findings here.

Hi @eddiebergman , @mfeurer I tested this on the other physical machine I have. I run into the same issue on that Linux machine with no virtualization at all. Could you please take a look and check what I'm doing wrong. I'm also happy to have a call and show the issue if needed.

Otherwise I won't be able to use this lib and have to switch to something else.

Regards, Stefan

Hi @f-istvan,

Sorry for the delay. I can't immediately see anything wrong with your setup although one thing in general I would recommend is to utilize more of your available cores if the memout issues were to be fixed.

For some context, the fact that so many memouts occur indicates to me a few possible reasons:

Is your data highly categorical? OneHotEncoding might balloon the dataset size significantly larger than the 290MB if it is.
There is some pipeline step that SMAC keeps selecting which is very memory hungry. It however should transition away from it if memouts occur, I think it's unlikely to be the case.

Diagnosing those issues can be done if you post the output of df_cv_results['params'] as this essentially contains the high level model definition that was tried with SMAC (our underlying optimizer).

Do the same issues appear at smaller timescales? i.e. 600s total time and 60s per model?

If you could provide this extra information, hopefully that will be enough to diagnose it

@f-istvan,

Did you try setting memory_limit=None?

Hi,

Sorry for the late response. First of all, here is a full example with generated training data with results:

import numpy as np
import pandas as pd
import autosklearn.regression

value_set = [0.0, 0.25, 0.5, 0.75, 1.0]

col = 89
row = 501808
training_data = np.random.choice(value_set, col * row).reshape(row, col)

df = pd.DataFrame(data=training_data)
print(df)

automl = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=3600,
    per_run_time_limit=360,
    memory_limit=27000
)

col = 1
row = 501808
target = np.random.choice(value_set, col * row).reshape(row, col)

print('start fit')
automl.fit(training_data, target, dataset_name='github_issue')
print('end fit')

df_cv_results = pd.DataFrame(automl.cv_results_).sort_values(by = 'mean_test_score', ascending = False)

print('df_cv_results')
print(df_cv_results)

print('automl.leaderboard')
print(automl.leaderboard(detailed = True, ensemble_only=False))

print('automl.get_models_with_weights')
print(automl.get_models_with_weights())

print('automl.sprint_statistics')
print(automl.sprint_statistics())

Output:

         0     1     2     3     4     5     6     7     8     9     10    11    12    13    14    15    16    17    18    19    20    21    22    23    24  ...    64    65    66    67    68    69    70    71    72    73    74    75    76    77    78    79    80    81    82    83    84    85    86    87    88
0       0.00  0.25  0.50  0.00  0.25  0.50  0.25  0.00  0.00  0.25  1.00  0.00  1.00  1.00  1.00  0.00  0.25  0.50  0.50  0.25  0.50  0.75  0.00  1.00  1.00  ...  0.75  0.75  0.25  0.00  0.50  0.50  0.50  0.50  0.75  0.75  0.75  0.25  1.00  1.00  1.00  1.00  0.25  0.50  0.25  0.50  0.75  0.75  1.00  0.75  0.00
1       0.25  1.00  0.25  0.00  0.75  0.25  0.50  0.00  0.50  0.50  0.25  0.50  0.00  0.50  0.25  0.50  0.75  0.75  0.75  0.25  0.00  0.25  1.00  0.00  0.50  ...  1.00  1.00  0.00  1.00  0.25  0.75  0.50  1.00  0.25  1.00  1.00  0.50  0.50  0.75  0.25  0.00  0.75  0.75  1.00  1.00  0.00  0.00  0.25  0.50  0.75
2       0.75  0.25  0.00  1.00  0.50  0.50  0.25  0.50  0.75  0.25  0.50  0.25  0.50  0.75  0.25  0.25  0.00  0.75  0.00  0.50  0.50  0.25  0.75  0.75  0.75  ...  0.25  0.25  0.25  1.00  0.25  0.75  0.75  0.00  0.75  0.25  0.25  0.25  1.00  0.50  0.75  0.50  0.25  0.25  0.25  0.00  0.00  0.50  1.00  0.50  0.25
3       0.00  0.50  0.25  0.25  0.50  0.75  0.50  0.25  0.00  0.75  0.50  0.50  0.25  1.00  0.00  0.75  0.00  0.50  0.50  0.75  0.00  0.75  0.50  0.50  0.75  ...  0.00  0.00  0.25  0.25  0.50  0.75  0.75  0.00  0.00  0.00  0.25  0.25  0.50  0.25  0.25  0.00  0.75  0.50  0.00  0.50  0.75  0.25  0.50  1.00  0.50
4       0.50  0.50  1.00  0.25  0.50  0.25  0.50  0.75  0.25  0.00  1.00  0.75  0.50  0.25  0.50  1.00  0.00  1.00  0.25  0.25  0.25  0.00  0.25  1.00  0.75  ...  0.75  1.00  0.25  0.75  0.50  1.00  0.50  0.75  1.00  0.75  0.00  0.25  0.25  0.25  0.75  1.00  0.00  0.00  0.00  0.00  0.50  0.00  0.50  0.25  0.00
...      ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...  ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...
501803  0.00  0.75  0.25  0.75  0.00  0.00  0.00  0.00  0.25  1.00  0.25  0.00  1.00  0.00  1.00  0.00  1.00  0.75  0.75  0.00  0.25  1.00  1.00  0.00  0.50  ...  1.00  1.00  0.50  0.75  1.00  0.25  1.00  0.25  0.75  1.00  0.25  1.00  0.00  0.00  0.25  1.00  1.00  0.00  0.00  0.75  0.00  0.50  0.25  0.50  0.75
501804  0.50  0.50  1.00  1.00  0.00  1.00  0.50  0.00  0.00  1.00  0.00  1.00  1.00  1.00  0.25  0.75  0.50  0.75  0.25  0.50  0.50  0.00  0.00  0.50  0.25  ...  0.75  1.00  0.00  1.00  0.00  0.75  0.00  0.25  1.00  0.25  0.00  0.50  1.00  0.50  1.00  0.25  0.25  0.00  0.00  0.25  0.75  0.25  1.00  0.50  1.00
501805  0.75  0.75  0.25  0.50  1.00  0.25  0.00  0.25  0.00  0.50  1.00  0.25  0.00  0.25  1.00  0.50  0.25  0.75  1.00  0.25  0.50  0.75  0.00  0.00  1.00  ...  0.00  0.50  0.25  0.00  0.00  0.00  0.25  0.00  0.50  0.25  1.00  1.00  0.50  0.25  0.00  1.00  0.75  0.25  0.00  0.50  0.00  1.00  1.00  0.00  0.75
501806  0.25  0.25  0.75  0.75  0.75  0.00  0.50  0.75  0.25  0.50  0.25  0.25  0.50  0.00  0.75  0.50  0.50  0.75  1.00  0.00  1.00  0.25  0.00  0.25  0.50  ...  0.50  0.50  1.00  1.00  1.00  1.00  1.00  0.25  1.00  0.75  0.75  0.25  0.75  0.00  0.50  0.00  0.00  1.00  0.50  0.75  0.75  1.00  0.00  0.50  1.00
501807  0.25  1.00  0.50  0.25  0.00  0.75  1.00  0.50  0.75  0.00  0.00  0.75  0.50  0.00  0.25  1.00  0.50  0.00  0.25  0.75  0.00  0.00  0.00  0.00  0.75  ...  0.75  0.00  0.75  0.25  0.75  1.00  0.75  0.50  0.00  1.00  0.25  0.00  0.25  0.75  0.75  0.50  0.25  0.75  0.00  0.00  0.25  0.25  0.75  1.00  1.00

[501808 rows x 89 columns]
start fit
[WARNING] [2021-09-20 21:00:20,620:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 1. Number of dummy models: 1
[WARNING] [2021-09-20 21:06:22,006:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 1. Number of dummy models: 1
[WARNING] [2021-09-20 21:08:59,789:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 2. Number of dummy models: 1
[WARNING] [2021-09-20 21:09:03,401:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 2. Number of dummy models: 1
[WARNING] [2021-09-20 21:15:04,783:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 2. Number of dummy models: 1
[WARNING] [2021-09-20 21:21:06,217:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 2. Number of dummy models: 1
[WARNING] [2021-09-20 21:21:40,158:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 3. Number of dummy models: 1
[WARNING] [2021-09-20 21:27:41,444:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 3. Number of dummy models: 1
[WARNING] [2021-09-20 21:27:44,680:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 4. Number of dummy models: 1
[WARNING] [2021-09-20 21:27:48,724:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 4. Number of dummy models: 1
[WARNING] [2021-09-20 21:32:48,108:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 5. Number of dummy models: 1
[WARNING] [2021-09-20 21:32:55,098:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 6. Number of dummy models: 1
[WARNING] [2021-09-20 21:33:17,063:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 6. Number of dummy models: 1
[WARNING] [2021-09-20 21:39:18,457:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 6. Number of dummy models: 1
[WARNING] [2021-09-20 21:39:22,160:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 6. Number of dummy models: 1
[WARNING] [2021-09-20 21:45:23,584:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 6. Number of dummy models: 1
[WARNING] [2021-09-20 21:51:25,029:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 6. Number of dummy models: 1
[WARNING] [2021-09-20 21:51:28,140:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 6. Number of dummy models: 1
[WARNING] [2021-09-20 21:53:47,552:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 6. Number of dummy models: 1
end fit
df_cv_results
    mean_test_score  mean_fit_time                                             params  rank_test_scores   status  budgets  ... param_regressor:libsvm_svr:gamma param_regressor:mlp:validation_fraction param_regressor:sgd:epsilon param_regressor:sgd:eta0 param_regressor:sgd:l1_ratio param_regressor:sgd:power_t
0          0.000000     360.106616  {'data_preprocessing:categorical_transformer:c...                 1  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
8          0.000000     360.011791  {'data_preprocessing:categorical_transformer:c...                 1  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
18         0.000000       1.826367  {'data_preprocessing:categorical_transformer:c...                 1   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
17         0.000000     360.108457  {'data_preprocessing:categorical_transformer:c...                 1  Timeout      0.0  ...                         0.032332                                     NaN                         NaN                      NaN                          NaN                         NaN
16         0.000000     360.104102  {'data_preprocessing:categorical_transformer:c...                 1  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
15         0.000000       2.424781  {'data_preprocessing:categorical_transformer:c...                 1   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
14         0.000000     360.104379  {'data_preprocessing:categorical_transformer:c...                 1  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
13         0.000000      20.691383  {'data_preprocessing:categorical_transformer:c...                 1   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
10         0.000000       2.746277  {'data_preprocessing:categorical_transformer:c...                 1   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
6          0.000000     360.105433  {'data_preprocessing:categorical_transformer:c...                 1  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
5          0.000000     360.104351  {'data_preprocessing:categorical_transformer:c...                 1  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
4          0.000000       2.180909  {'data_preprocessing:categorical_transformer:c...                 1   Memout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
2          0.000000     360.104890  {'data_preprocessing:categorical_transformer:c...                 1  Timeout      0.0  ...                         0.002623                                     NaN                         NaN                      NaN                          NaN                         NaN
19         0.000000     138.103932  {'data_preprocessing:categorical_transformer:c...                 1  Timeout      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
3         -0.000005     157.529032  {'data_preprocessing:categorical_transformer:c...                15  Success      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
9         -0.000006       2.959799  {'data_preprocessing:categorical_transformer:c...                16  Success      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
12        -0.000013       6.571774  {'data_preprocessing:categorical_transformer:c...                17  Success      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
1         -0.000317      12.087915  {'data_preprocessing:categorical_transformer:c...                18  Success      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
7         -0.001563      33.671417  {'data_preprocessing:categorical_transformer:c...                19  Success      0.0  ...                              NaN                                     NaN                         NaN                      NaN                          NaN                         NaN
11        -0.002651     299.073780  {'data_preprocessing:categorical_transformer:c...                20  Success      0.0  ...                              NaN                                     0.1                         NaN                      NaN                          NaN                         NaN

[20 rows x 161 columns]
automl.leaderboard
Traceback (most recent call last):
  File "app.py", line 34, in <module>
    print(automl.leaderboard(detailed = True, ensemble_only=False))
  File "/home/i/dev/sources/mytest/.venv/lib/python3.7/site-packages/autosklearn/estimators.py", line 741, in leaderboard
    model_runs[model_id]['ensemble_weight'] = weight
KeyError: 1

@eddiebergman I tried to set n_jobs=2 , 3, 4 up to 8. In all the cases I got a Killed message on my console and the program just stopped. Now based on this stackoverflow question I think this has the same kind of memory issue: https://stackoverflow.com/questions/19189522/what-does-killed-mean-when-a-processing-of-a-huge-csv-with-python-which-sudde

Do the same issues appear at smaller timescales? i.e. 600s total time and 60s per model? -> no, with smaller timescales it finishes successfully. I think 120s total was successful once when I tried to play with this.

Did you try setting memory_limit=None? -> not yet, I will try to do that and post the df_cv_results['params'] too.

Thank you so much!

Hmm so let me address this in a few points:

warnings: This is kind of expected due to the fact the mapping from inputs to outputs is random. This is okay but perhaps we should really just hide those in the log and not display them as big warnings. If of course the issue persists at the end, then we can give that warning.
memouts: What's interesting is that these memouts occur quite quickly, despite the dataset being quite small. A further note, this memory=27GB is split between n_jobs and this should perhaps be made more clear.

Anyways This could be indicative of two things -
- We use the temporary directory by default. I had an issue recently where my system storage was fine (200GB+) but the partition that housed /tmp only had 1GB of free space, causing containers to not build properly. To diagnose this, you can use a graphical interface but also the command df -H. If your /tmp dir doesn't have 27GB available then this would explain it and we should document this behavior more clearly or perform a check before running. If this is not the issue then we have a memory issue somewhere and we would love to find it.
  
  Four possible workarounds in the meantime:
  - One, run auto sklearn with the environment variable $TMPDIR $TMPDIR=/path/to/custom/temp python myscript.py
  - Two, you can use the two parameters, tmp_folder and delete_tmp_folder_after_terminate. If you go with this work around and set delete_tmp_folder_after_terminate = False then you should be able to inspect what is consuming the most memory. Note: we will likely change this to just a single parameter working_dir for version 0.15.0 as these parameters are often set together.
  - Three, use the parameter max_models_on_disc, which defaults to 50. While I think 50 models should easily fit in 27GB, it's just another tunable parameter I can point you to that might help the issue.
  - Four, make sure /tmp has the 27GB of space expected.
- The second possible issue could be that some model configurations are eating up way more memory than expected. A typical memory hungry model is a KNN but as stated, the optimizer should move away from these models once a few failures occur.
  
  To diagnose this, it would be helpful for me to see the csv output of df_cv_results. In the meantime, if you can see one particular model is causing this issue (filter df_cv_results by status == memout) then that's indicative of something going wrong on our side and we would be glad to fix it.

In general it's quite difficult to allocate resources to do runs as long as your but it seems like it's something we should try testing soon. We appreciate your time and effort and hopefully we can figure this out.

Seeing as there has been no response, we're not sure if this has been solved so closing the issue for now. Feel free to re-open this if anything reoccurs.

automl / auto-sklearn

Getting out of memory from Dummy prediction no matter how much memory is allocated. #978