Open JustinDoIt opened 2 years ago
Hi @JustinDoIt,
We seem to generally have issues with large amount of available CPU's. We are unable to test these problems very well due to our infrastructure.
A common issue is documented in #1236, which is odd that this bug it does not happen in your case, thank you for reporting it.
We use SMAC as our optimizer and ConfigSpace as our search space builder. We believe the bottleneck is in one of these two places but we need to spend some dedicated time to figure out why new processes are not started.
In the meantime, it would help to get more of an idea on when the bottleneck is reached. For example, autosklearn can effectively use around 6 cores on my machine no problem.
Best, Eddie
Hi @eddiebergman
Actually I also encounter the error mentioned in #1236 (details follow)
n_jobs=8
the issue still remains. memory_limit=None
, max_models_on_disc=None
)In a specific case, all cores work normally in the first 30 iterations. But after about 30 iterations, only one core works. At 263 iterations, the memory consumption of two processes was suddenly very large (I didn't notice whether it was also large before, I missed it). Subsequently, the error mentioned in #1236 was encountered and the program crashed
@JustinDoIt just out of curiosity, have you also used additional arguments when instantiating the class? I used to have the same issue but got it solved when I realized that I was messing around with the metric
and scoring_functions
argument. I can be more detailed if you want.
@e3vela Here are my code and thanks for comments
# -*- coding: utf-8 -*
import autosklearn.classification
import autosklearn
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, accuracy_score, recall_score, precision_score, confusion_matrix, matthews_corrcoef,cohen_kappa_score,mean_absolute_error,mean_squared_error,r2_score
from sklearn.inspection import plot_partial_dependence, permutation_importance
import matplotlib.pyplot as plt
from autosklearn.metrics import balanced_accuracy, precision, recall, f1
import os
import ast
import logging
import time
from time import strftime, gmtime
import random
import sys
from datetime import datetime
from joblib import dump, load
def automl_feat_comb(feat_comb, exp_name, runtime):
now_time = str(datetime.now()).replace(' ', '-').replace('.', '-')
log = create_logger(
name=exp_name,
silent=False,
to_disk=True,
log_file=f'{exp_name}_{now_time}.txt',
)
logging_config = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'custom': {
# More format options are available in the official
# `documentation <https://docs.python.org/3/howto/logging-cookbook.html>`_
'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
}
},
# Any INFO level msg will be printed to the console
'handlers': {
'console': {
'level': 'INFO',
'formatter': 'custom',
'class': 'logging.StreamHandler',
'stream': 'ext://sys.stdout',
},
},
'loggers': {
'': { # root logger
'level': 'DEBUG',
},
'Client-EnsembleBuilder': {
'level': 'DEBUG',
'handlers': ['console'],
},
},
}
# Data
df = pd.read_csv('xxxxx.csv') # sorry
X = df.loc[:, feat_comb]
y = df.loc[:, 'Status']
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
# Model
per_runtime = min(runtime // 10, 1800)
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=runtime,
per_run_time_limit=per_runtime,
initial_configurations_via_metalearning=25,
ensemble_size=50,
ensemble_nbest=50,
max_models_on_disc=None,
memory_limit=None,
tmp_folder='./tmp_folder' + str(random.randrange(1,1000)),
delete_tmp_folder_after_terminate=True,
n_jobs=-1,
seed=42,
logging_config=logging_config,
)
# Train
automl.fit(x_train, y_train)
# save
dump(automl, 'september.joblib')
# Evaluation
predictions_x_train = automl.predict(x_train)
predictions = automl.predict(x_test)
# Logging
log.info("[INFO] start training....")
def main():
feat_combs = {
'ga10000': ['XXX', ..., 'XXXX'],
}
for exp_name, feat_comb in feat_combs.items():
runtime = 600
automl_feat_comb(feat_comb, exp_name=f"{exp_name}_feat_{len(feat_comb)}", runtime=runtime)
I haven't tried the code myself but I don't see anything wrong in your implementation. I might run it later when I have the time.
This is most likely due to https://github.com/automl/SMAC3/issues/774, which basically says that getting new configurations (i.e. which model with which hyperparameters to try next) is not executed in parallel. When running in parallel, and evaluating configurations is faster than the suggestion mechanism, you'll observe the pattern reported here, namely that auto-sklearn uses only a single core. Up to iteration 30, auto-sklearn suggests configurations via meta-learning (in a single batch), which explains why parallelism works in the beginning. Unfortunately, there is not really anything that can be done about this. In such cases you might be better of using random search as it can make full use of the parallel setting.
OK, I just know a little about meta-learning, but it sounds like: after meta-learning (found nice configuration), it doesn't need to continue parallel, does it?
In my tests and in my case, parallelism will not affect the accuracy, (and combined with my understanding above) so I think this is a not-real bug that doesn't need to be paid attention to. Therefore, I will close this issue in 3 days if there is no objection.
By the way, I actually encounter the bug mentioned #1236 (but not every time). I think there may be some relationship between the two issues (meta-learning?)
Finally, thank the auto-sklearn team for your contribution. Auto-sklearn is really an awesome and great nuclear weapon. :D
Re-opening this to track that we still have an issue with parallelism when the dataset is small.
I trained my model on a 36 core CPU and set
n_jobs=-1
and it worked.However, from the perspective of
htop
, auto-sklearn only occupies one or two cores most of the time. Is there any way to improve CPU utilization?