Running multiple AutoML evaluations in parallel?

chengrunyang commented 4 years ago

Hi, I encountered some problems when trying to use multiprocessing to run multiple auto-sklearn processes in parallel: Each process doesn't behave as when it is started individually. I wonder if you have hint on why this would happen; below is a MWE that might help.

Below, I try to run the function run in parallel. This function is adopted from your example at https://github.com/automl/auto-sklearn/blob/master/examples/example_sequential.py.

import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
import numpy as np
import autosklearn.classification
import multiprocessing as mp

def run(i): # i just serves as an index here
    print(i)
    X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
    X_train, X_test, y_train, y_test = \
        sklearn.model_selection.train_test_split(X, y, random_state=1)

    automl = autosklearn.classification.AutoSklearnClassifier(
        time_left_for_this_task=120,
        per_run_time_limit=30,
#         tmp_folder='/tmp/autosklearn_sequential_example_tmp',
#         output_folder='/tmp/autosklearn_sequential_example_out',
        # Do not construct ensembles in parallel to avoid using more than one
        # core at a time. The ensemble will be constructed after auto-sklearn
        # finished fitting all machine learning models.
        ensemble_size=0,
#         delete_tmp_folder_after_terminate=False,
    )
    automl.fit(X_train, y_train, dataset_name='breast_cancer')
    # This call to fit_ensemble uses all models trained in the previous call
    # to fit to build an ensemble which can be used with automl.predict()
    automl.fit_ensemble(y_train, ensemble_size=50)

    print(automl.show_models())
    predictions = automl.predict(X_test)
    print(automl.sprint_statistics())
    print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))

p1 = mp.Pool(100)
result = [p1.apply_async(run, args=[i]) for i in np.arange(500)]
p1.close()
p1.join()

When running the above code, I just get a series of i's printed out and the program quickly terminated without getting any AutoML result. However, when I run the function run() individually, for example, if I replace the last four lines by run(2), the AutoML fitting and prediction process behaves normally. Any ideas? Thanks :)

mfeurer commented 4 years ago

In general, Auto-sklearn heavily depends on the tmp_folder. The tmp_folder can be used by multiple instances of Auto-sklearn in parallel, but then they need to have different seeds. However, Auto-sklearn uses a default seed of 1 (see here). As the directory is deleted after running, your script works well in the sequential case. In the parallel case however, the different instances of Auto-sklearn get in each others way. Two potential solutions:

Use different seeds by passing seed=i to the constructor.
Use different values for tmp_folder.

I would go for option 1. Please reopen if this doesn't help.

chengrunyang commented 4 years ago

Thanks for the prompt response! I tried both options by changing the class constructor call to either

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120,
    per_run_time_limit=30,
    ensemble_size=0,
    seed=i,
    delete_tmp_folder_after_terminate=True,
)

or

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120,
    per_run_time_limit=30,
    tmp_folder='/tmp/autosklearn_sequential_example_tmp_{}'.format(i),
    ensemble_size=0,
    delete_tmp_folder_after_terminate=True,
)

However, the program still finishes promptly without giving any results. Have I done it wrong?

mfeurer commented 4 years ago

I just tried myself and replaced the process pool with joblib as it gives nicer outputs:

import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
import numpy as np
import autosklearn.classification
from joblib import Parallel, delayed

def run(i): # i just serves as an index here
    print(i)
    X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
    X_train, X_test, y_train, y_test = \
        sklearn.model_selection.train_test_split(X, y, random_state=1)

    automl = autosklearn.classification.AutoSklearnClassifier(
        time_left_for_this_task=120,
        per_run_time_limit=30,
        ensemble_size=0,
        seed=i,
        delete_tmp_folder_after_terminate=True,
    )
    automl.fit(X_train, y_train, dataset_name='breast_cancer')
    # This call to fit_ensemble uses all models trained in the previous call
    # to fit to build an ensemble which can be used with automl.predict()
    automl.fit_ensemble(y_train, ensemble_size=50)

    print(automl.show_models())
    predictions = automl.predict(X_test)
    print(automl.sprint_statistics())
    print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
    return sklearn.metrics.accuracy_score(y_test, predictions)

if __name__ == '__main__':
    result = Parallel(n_jobs=4, backend='multiprocessing')(delayed(run)(i) for i in np.arange(500))
    print(result)

and the output is

/home/feurerm/miniconda/3-4.5.4/envs/autosklearn/bin/python3.7 /home/feurerm/sync_dir/projects/automl_competition_2015/auto-sklearn/issues/0759.py
/home/feurerm/miniconda/3-4.5.4/envs/autosklearn/lib/python3.7/site-packages/pyparsing.py:2681: FutureWarning: Possible set intersection at position 3
  self.re = re.compile( self.reString )
0
1
2
3
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/feurerm/miniconda/3-4.5.4/envs/autosklearn/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/feurerm/miniconda/3-4.5.4/envs/autosklearn/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 561, in __call__
    return self.func(*args, **kwargs)
  File "/home/feurerm/miniconda/3-4.5.4/envs/autosklearn/lib/python3.7/site-packages/joblib/parallel.py", line 224, in __call__
    for func, args, kwargs in self.items]
  File "/home/feurerm/miniconda/3-4.5.4/envs/autosklearn/lib/python3.7/site-packages/joblib/parallel.py", line 224, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/feurerm/sync_dir/projects/automl_competition_2015/auto-sklearn/issues/0759.py", line 21, in run
    automl.fit(X_train, y_train, dataset_name='breast_cancer')
  File "/home/feurerm/sync_dir/projects/automl_competition_2015/auto-sklearn/autosklearn/estimators.py", line 664, in fit
    dataset_name=dataset_name,
  File "/home/feurerm/sync_dir/projects/automl_competition_2015/auto-sklearn/autosklearn/estimators.py", line 337, in fit
    self._automl[0].fit(**kwargs)
  File "/home/feurerm/sync_dir/projects/automl_competition_2015/auto-sklearn/autosklearn/automl.py", line 996, in fit
    load_models=load_models,
  File "/home/feurerm/sync_dir/projects/automl_competition_2015/auto-sklearn/autosklearn/automl.py", line 208, in fit
    only_return_configuration_space=only_return_configuration_space,
  File "/home/feurerm/sync_dir/projects/automl_competition_2015/auto-sklearn/autosklearn/automl.py", line 384, in _fit
    num_run = self._do_dummy_prediction(datamanager, num_run)
  File "/home/feurerm/sync_dir/projects/automl_competition_2015/auto-sklearn/autosklearn/automl.py", line 306, in _do_dummy_prediction
    ta.run(1, cutoff=self._time_for_task)
  File "/home/feurerm/sync_dir/projects/automl_competition_2015/auto-sklearn/autosklearn/evaluation/__init__.py", line 211, in run
    obj(**obj_kwargs)
  File "/home/feurerm/miniconda/3-4.5.4/envs/autosklearn/lib/python3.7/site-packages/pynisher/limit_function_call.py", line 218, in __call__
    subproc.start()
  File "/home/feurerm/miniconda/3-4.5.4/envs/autosklearn/lib/python3.7/multiprocessing/process.py", line 110, in start
    'daemonic processes are not allowed to have children'
AssertionError: daemonic processes are not allowed to have children
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/feurerm/sync_dir/projects/automl_competition_2015/auto-sklearn/issues/0759.py", line 34, in <module>
    result = Parallel(n_jobs=4, backend='multiprocessing')(delayed(run)(i) for i in np.arange(500))
  File "/home/feurerm/miniconda/3-4.5.4/envs/autosklearn/lib/python3.7/site-packages/joblib/parallel.py", line 962, in __call__
    self.retrieve()
  File "/home/feurerm/miniconda/3-4.5.4/envs/autosklearn/lib/python3.7/site-packages/joblib/parallel.py", line 865, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/feurerm/miniconda/3-4.5.4/envs/autosklearn/lib/python3.7/multiprocessing/pool.py", line 683, in get
    raise self._value
AssertionError: daemonic processes are not allowed to have children

Process finished with exit code 1

I assume that this is the issue with the regular multiprocessing pool, too. Unfortunately, I don't know how to solve this except for handling parallelism on a different level, for example in the shell.

chengrunyang commented 4 years ago

I see! Looks like the issue is because auto-sklearn uses multiprocessing for its subroutine, hence there can't be multiprocessing pools on top of it. I will try something like shell parallelization. Thanks!

automl / auto-sklearn

Running multiple AutoML evaluations in parallel? #759