automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.6k stars 1.28k forks source link

Seems like Auto-sklearn deletes pynisher std.out file too soon and causes the pipeline config evaluation crash #1016

Closed krzischp closed 3 years ago

krzischp commented 3 years ago

Describe the bug

I'm using the "Obtain run information" example to monitor configurations trials. https://automl.github.io/auto-sklearn/master/examples/40_advanced/example_get_pipeline_components.html#sphx-glr-examples-40-advanced-example-get-pipeline-components-py I don't understand some of the StatusType.CRASHED logs.

RunValue(cost=1.0, time=0.0, status=<StatusType.CRASHED: 3>, starttime=1606347167.2583704, endtime=1606347218.3419402, additional_info={'traceback': 'Traceback (most recent call last):\n File "/home/container/t732787/TESTS AUTOSKLEARN/auto-sklearn-master/autosklearn/evaluation/init.py", line 291, in run\n obj(obj_kwargs)\n File "/opt/miniconda/lib/python3.6/site-packages/pynisher/limit_function_call.py", line 287, in call\n with open(os.path.join(tmp_dir.name, \'std.out\'), \'r\') as fh:\nFileNotFoundError: [Errno 2] No such file or directory: \'/tmp/tmp65tdmi7p/std.out\'\n', 'error': "FileNotFoundError(2, 'No such file or directory')"})**

Configuration: balancing:strategy, Value: 'weighting' classifier:choice, Value: 'extra_trees' classifier:extra_trees:bootstrap, Value: 'False' classifier:extra_trees:criterion, Value: 'entropy' classifier:extra_trees:max_depth, Constant: 'None' classifier:extra_trees:max_features, Value: 0.5176100810181034 classifier:extra_trees:max_leaf_nodes, Constant: 'None' classifier:extra_trees:min_impurity_decrease, Constant: 0.0 classifier:extra_trees:min_samples_leaf, Value: 1 classifier:extra_trees:min_samples_split, Value: 3 classifier:extra_trees:min_weight_fraction_leaf, Constant: 0.0 data_preprocessing:categorical_transformer:categorical_encoding:choice, Value: 'one_hot_encoding' data_preprocessing:categorical_transformer:category_coalescence:choice, Value: 'minority_coalescer' data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Value: 0.11663986926201173 data_preprocessing:numerical_transformer:imputation:strategy, Value: 'median' data_preprocessing:numerical_transformer:rescaling:choice, Value: 'none' feature_preprocessor:choice, Value: 'fast_ica' feature_preprocessor:fast_ica:algorithm, Value: 'deflation' feature_preprocessor:fast_ica:fun, Value: 'logcosh' feature_preprocessor:fast_ica:n_components, Value: 187 feature_preprocessor:fast_ica:whiten, Value: 'True'

I also have the same log with a Gradient Boost pipeline configuration.

To Reproduce

Steps to reproduce the behavior:

I'm using the Iris dataset

I'm running this script

cls = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=400, per_run_time_limit=100, ) cls.fit(train_X, train_Y)

I'm printing the logs with the following snippet: for runkey in list(automl.automl.runhistory.data.keys()): print(automl.automl.runhistory_.data[runkey]) print("@@@@") print(automl.automl.runhistory_.ids_config[run_key.config_id]) print("****")

Expected behavior

I expect the pipelines configurations to crash only because of out of memory exceptions.

Actual behavior, stacktrace or logfile

No such file or directory

Environment and installation:

Details about my installation:

mfeurer commented 3 years ago

Hi @krzischp, thanks a lot for reporting this issue. It seems that we can also observe this in a recent unit test failure: https://github.com/automl/auto-sklearn/pull/1011/checks?check_run_id=1453406333

Unfortunately, this doesn't explain why it happens or how to fix it as at least I cannot reproduce this on my machine. Can you reproduce this reliably? If yes, it would be great if you could put a few print statements into the pynisher at the following places and report the results:

CC @sfalkner @LMZimmer @franchuterivera has anyone of you seen this issue before and can reproduce it locally?

mfeurer commented 3 years ago

Not sure if related or not, the files are created inside the subprocess (https://github.com/automl/pynisher/blob/master/pynisher/limit_function_call.py#L80) and are never closed - not sure if this is an issue. We should create those files outside and pass them into the subprocess if necessary. (In general, too much setup is done in the subprocess...)

krzischp commented 3 years ago

Thanks for answering so quickly!

I inserted those lines of code (print command lines) in the file pynisher/limit_function_call.py (l. 225 and l. 286) and I reproduced the issue. l. 225:

if self.capture_output:
    tmp_dir = tempfile.TemporaryDirectory()
    tmp_dir_name = tmp_dir.name

else:
    tmp_dir_name = None
print('****************@@@@****************created temporary directory****************@@@@****************', tmp_dir_name )
if tmp_dir_name and os.path.exists(tmp_dir_name):
    print('****************@@@@****************EXISTS****************@@@@****************', tmp_dir_name )
# create and start the process

l. 286:

# recover stdout and stderr if requested
if self.capture_output:
    if tmp_dir.name and os.path.exists(tmp_dir.name):
        print('****************$$$$****************L. 289: STILL EXISTS****************$$$$****************', tmp_dir.name )
    else:
        print('****************$$$$****************L. 289: DOESNT EXISTS ANYMORE****************$$$$****************', tmp_dir.name )
    if tmp_dir.name and os.path.exists(os.path.join(tmp_dir.name, 'std.out')):
        print('****************$$$$****************L. 289: FILE std.out STILL EXISTS****************$$$$****************', tmp_dir.name )
    else:
        print('****************$$$$****************L. 289: FILE std.out DOESNT EXISTS ANYMORE****************$$$$****************', tmp_dir.name )
    if tmp_dir.name and os.path.exists(os.path.join(tmp_dir.name, 'std.err')):
        print('****************$$$$****************L. 289: FILE std.err STILL EXISTS****************$$$$****************', tmp_dir.name )
    else:
        print('****************$$$$****************L. 289: FILE std.err DOESNT EXISTS ANYMORE****************$$$$****************', tmp_dir.name )
    with open(os.path.join(tmp_dir.name, 'std.out'), 'r') as fh:
        self2.stdout = fh.read()
    with open(os.path.join(tmp_dir.name, 'std.err'), 'r') as fh:
        self2.stderr = fh.read()

    tmp_dir.cleanup()

# don't leave zombies behind

It seem that your suspicions are correct. Here are the results of the following crashed configuration:

RunValue(cost=1.0, time=0.0, status=<StatusType.CRASHED: 3>, starttime=1606406533.7877831, endtime=1606406633.932273, additional_info={'traceback': 'Traceback (most recent call last):\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/evaluation/__init__.py", line 291, in run\n    obj(**obj_kwargs)\n  File "/opt/miniconda/lib/python3.6/site-packages/pynisher/limit_function_call.py", line 301, in __call__\n    with open(os.path.join(tmp_dir.name, \'std.out\'), \'r\') as fh:\nFileNotFoundError: [Errno 2] No such file or directory: \'/tmp/tmpp58hwckf/std.out\'\n', 'error': "FileNotFoundError(2, 'No such file or directory')"})

Configuration:
  balancing:strategy, Value: 'none'
  classifier:__choice__, Value: 'extra_trees'
  classifier:extra_trees:bootstrap, Value: 'True'
  classifier:extra_trees:criterion, Value: 'entropy'
  classifier:extra_trees:max_depth, Constant: 'None'
  classifier:extra_trees:max_features, Value: 0.7304811343030777
  classifier:extra_trees:max_leaf_nodes, Constant: 'None'
  classifier:extra_trees:min_impurity_decrease, Constant: 0.0
  classifier:extra_trees:min_samples_leaf, Value: 2
  classifier:extra_trees:min_samples_split, Value: 3
  classifier:extra_trees:min_weight_fraction_leaf, Constant: 0.0
  data_preprocessing:categorical_transformer:categorical_encoding:__choice__, Value: 'one_hot_encoding'
  data_preprocessing:categorical_transformer:category_coalescence:__choice__, Value: 'no_coalescense'
  data_preprocessing:numerical_transformer:imputation:strategy, Value: 'mean'
  data_preprocessing:numerical_transformer:rescaling:__choice__, Value: 'robust_scaler'
  data_preprocessing:numerical_transformer:rescaling:robust_scaler:q_max, Value: 0.7202654714984674
  data_preprocessing:numerical_transformer:rescaling:robust_scaler:q_min, Value: 0.21555973677200435
  feature_preprocessor:__choice__, Value: 'feature_agglomeration'
  feature_preprocessor:feature_agglomeration:affinity, Value: 'euclidean'
  feature_preprocessor:feature_agglomeration:linkage, Value: 'average'
  feature_preprocessor:feature_agglomeration:n_clusters, Value: 149
  feature_preprocessor:feature_agglomeration:pooling_func, Value: 'median'

Printed output:

****************@@@@****************created temporary directory****************@@@@**************** /tmp/tmpp58hwckf
****************@@@@****************EXISTS****************@@@@**************** /tmp/tmpp58hwckf
****************$$$$****************L. 289: STILL EXISTS****************$$$$**************** /tmp/tmpp58hwckf
****************$$$$****************L. 289: FILE std.out DOESNT EXISTS ANYMORE****************$$$$**************** /tmp/tmpp58hwckf
****************$$$$****************L. 289: FILE std.err DOESNT EXISTS ANYMORE****************$$$$**************** /tmp/tmpp58hwckf

The directory still exists but the files std.out and std.err have been deleted somehow .

Do you have any idea how to fix this?

mfeurer commented 3 years ago

Thanks a lot for the swift response and the logs.

The directory still exists but the files std.out and std.err have been deleted somehow .

I'm actually wondering whether that's really the case. Did std.out and std.err ever exist? Would you mind pasting the following code into https://github.com/automl/pynisher/blob/master/pynisher/limit_function_call.py#L222 and running again?

                    # create the files to capture output
                    with open(os.path.join(tmp_dir_name, 'std.out'), 'a', buffering=1):
                        pass
                    with open(os.path.join(tmp_dir_name, 'std.err'), 'a', buffering=1):
                        pass
krzischp commented 3 years ago

Hi @mfeurer I uninstalled and reinstalled the exact same version of Auto-Sklearn and I couldn't reproduce the issue. Seems like the inexistant file problem doesn't appear anymore. I really don't understand, insofar as I didn't change anything else.

Next time I'm confronted with this issue, I will test adding your snippet in the limit_function_call.py script.

Thank you!

krzischp commented 3 years ago

Hi @mfeurer , I just managed to reproduce the issue with the following dataset: https://www.kaggle.com/c/santander-customer-satisfaction

Here is my initialization and training script:

automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=3600,
per_run_time_limit=1000,
memory_limit=20000,
metric=roc_auc,
n_jobs=1
)
automl.fit(train_X, train_Y, dataset_name='santander_customer_satisfaction')

And here is one of the crashed pipelines I get:

RunValue(cost=1.0, time=0.0, status=<StatusType.CRASHED: 3>, starttime=1606759660.289351, endtime=1606760660.3664262, additional_info={'traceback': 'Traceback (most recent call last):\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/evaluation/__init__.py", line 291, in run\n    obj(**obj_kwargs)\n  File "/opt/miniconda/lib/python3.6/site-packages/pynisher/limit_function_call.py", line 287, in __call__\n    with open(os.path.join(tmp_dir.name, \'std.out\'), \'r\') as fh:\nFileNotFoundError: [Errno 2] No such file or directory: \'/tmp/tmp49tnauo_/std.out\'\n', 'error': "FileNotFoundError(2, 'No such file or directory')"})

Configuration:
  balancing:strategy, Value: 'weighting'
  classifier:__choice__, Value: 'passive_aggressive'
  classifier:passive_aggressive:C, Value: 0.15115687152536414
  classifier:passive_aggressive:average, Value: 'True'
  classifier:passive_aggressive:fit_intercept, Constant: 'True'
  classifier:passive_aggressive:loss, Value: 'hinge'
  classifier:passive_aggressive:tol, Value: 0.009954631623104506
  data_preprocessing:categorical_transformer:categorical_encoding:__choice__, Value: 'no_encoding'
  data_preprocessing:categorical_transformer:category_coalescence:__choice__, Value: 'minority_colescer'
  data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Value: 0.0013410356664535208
  data_preprocessing:numerical_transformer:imputation:strategy, Value: 'most_frequent'
  data_preprocessing:numerical_transformer:rescaling:__choice__, Value: 'quantile_transformer'
  data_preprocessing:numerical_transformer:rescaling:quantile_transformer:n_quantiles, Value: 1293
  data_preprocessing:numerical_transformer:rescaling:quantile_transformer:output_distribution, Value: 'uniform'
  feature_preprocessor:__choice__, Value: 'no_preprocessing'

I'm actually running my script with your suggested snippet addition in the pynisher code. I will send you the new output as soon as it finished running.

krzischp commented 3 years ago

Hi,

With this snippet added, I get those two crash errors now (with the Santander Customer Satisfaction dataset):

ValueError: Bug in scikit-learn: https://github.com/scikit-learn/scikit-learn/pull/2738

RunValue(cost=1.0, time=7.9786248207092285, status=<StatusType.CRASHED: 3>, starttime=1606770283.0365896, endtime=1606770291.0558894, additional_info={'traceback': 'Traceback (most recent call last):\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/pipeline/components/feature_preprocessing/fast_ica.py", line 41, in fit\n    self.preprocessor.fit(X)\n  File "/opt/miniconda/lib/python3.6/site-packages/sklearn/decomposition/_fastica.py", line 576, in fit\n    self._fit(X, compute_sources=False)\n  File "/opt/miniconda/lib/python3.6/site-packages/sklearn/decomposition/_fastica.py", line 511, in _fit\n    W, n_iter = _ica_par(X1, **kwargs)\n  File "/opt/miniconda/lib/python3.6/site-packages/sklearn/decomposition/_fastica.py", line 109, in _ica_par\n    - g_wtx[:, np.newaxis] * W)\n  File "/opt/miniconda/lib/python3.6/site-packages/sklearn/decomposition/_fastica.py", line 56, in _sym_decorrelation\n    s, u = linalg.eigh(np.dot(W, W.T))\n  File "/opt/miniconda/lib/python3.6/site-packages/scipy/linalg/decomp.py", line 374, in eigh\n    a1 = _asarray_validated(a, check_finite=check_finite)\n  File "/opt/miniconda/lib/python3.6/site-packages/scipy/_lib/_util.py", line 239, in _asarray_validated\n    a = toarray(a)\n  File "/opt/miniconda/lib/python3.6/site-packages/numpy/lib/function_base.py", line 496, in asarray_chkfinite\n    "array must not contain infs or NaNs")\nValueError: array must not contain infs or NaNs\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/evaluation/__init__.py", line 31, in fit_predict_try_except_decorator\n    return ta(queue=queue, **kwargs)\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/evaluation/train_evaluator.py", line 1075, in eval_holdout\n    evaluator.fit_predict_and_loss(iterative=iterative)\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/evaluation/train_evaluator.py", line 465, in fit_predict_and_loss\n    add_model_to_self=self.num_cv_folds == 1,\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/evaluation/train_evaluator.py", line 806, in _partial_fit_and_predict_standard\n    self.Y_train[train_indices],\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/evaluation/abstract_evaluator.py", line 105, in _fit_and_suppress_warnings\n    model.fit(X, y)\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/pipeline/base.py", line 91, in fit\n    X, fit_params = self.fit_transformer(X, y, **fit_params)\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/pipeline/classification.py", line 98, in fit_transformer\n    X, y, fit_params=fit_params)\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/pipeline/base.py", line 101, in fit_transformer\n    Xt, fit_params = self._fit(X, y, **fit_params)\n  File "/opt/miniconda/lib/python3.6/site-packages/sklearn/pipeline.py", line 315, in _fit\n    **fit_params_steps[name])\n  File "/opt/miniconda/lib/python3.6/site-packages/joblib/memory.py", line 355, in __call__\n    return self.func(*args, **kwargs)\n  File "/opt/miniconda/lib/python3.6/site-packages/sklearn/pipeline.py", line 730, in _fit_transform_one\n    res = transformer.fit(X, y, **fit_params).transform(X)\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/pipeline/components/base.py", line 429, in fit\n    return self.choice.fit(X, y, **kwargs)\n  File "/opt/miniconda/lib/python3.6/site-packages/autosklearn/pipeline/components/feature_preprocessing/fast_ica.py", line 44, in fit\n    raise ValueError("Bug in scikit-learn: "\nValueError: Bug in scikit-learn: https://github.com/scikit-learn/scikit-learn/pull/2738\n', 'error': "ValueError('Bug in scikit-learn: https://github.com/scikit-learn/scikit-learn/pull/2738',)", 'configuration_origin': 'Initial design'})

Configuration:
  balancing:strategy, Value: 'none'
  classifier:__choice__, Value: 'random_forest'
  classifier:random_forest:bootstrap, Value: 'True'
  classifier:random_forest:criterion, Value: 'entropy'
  classifier:random_forest:max_depth, Constant: 'None'
  classifier:random_forest:max_features, Value: 0.49782482408932727
  classifier:random_forest:max_leaf_nodes, Constant: 'None'
  classifier:random_forest:min_impurity_decrease, Constant: 0.0
  classifier:random_forest:min_samples_leaf, Value: 1
  classifier:random_forest:min_samples_split, Value: 7
  classifier:random_forest:min_weight_fraction_leaf, Constant: 0.0
  data_preprocessing:categorical_transformer:categorical_encoding:__choice__, Value: 'one_hot_encoding'
  data_preprocessing:categorical_transformer:category_coalescence:__choice__, Value: 'minority_coalescer'
  data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Value: 0.0085685196603325
  data_preprocessing:numerical_transformer:imputation:strategy, Value: 'mean'
  data_preprocessing:numerical_transformer:rescaling:__choice__, Value: 'quantile_transformer'
  data_preprocessing:numerical_transformer:rescaling:quantile_transformer:n_quantiles, Value: 891
  data_preprocessing:numerical_transformer:rescaling:quantile_transformer:output_distribution, Value: 'normal'
  feature_preprocessor:__choice__, Value: 'fast_ica'
  feature_preprocessor:fast_ica:algorithm, Value: 'parallel'
  feature_preprocessor:fast_ica:fun, Value: 'logcosh'
  feature_preprocessor:fast_ica:whiten, Value: 'False'

And

Result queue is empty

RunValue(cost=1.0, time=93.10029578208923, status=<StatusType.CRASHED: 3>, starttime=1606770877.0391214, endtime=1606770971.1559799, additional_info={'error': 'Result queue is empty', 'exit_status': "<class 'pynisher.limit_function_call.AnythingException'>", 'subprocess_stdout': '', 'subprocess_stderr': '', 'exitcode': -11, 'configuration_origin': 'Initial design'})

Configuration:
  balancing:strategy, Value: 'weighting'
  classifier:__choice__, Value: 'liblinear_svc'
  classifier:liblinear_svc:C, Value: 3.231854403235891
  classifier:liblinear_svc:dual, Constant: 'False'
  classifier:liblinear_svc:fit_intercept, Constant: 'True'
  classifier:liblinear_svc:intercept_scaling, Constant: 1
  classifier:liblinear_svc:loss, Value: 'squared_hinge'
  classifier:liblinear_svc:multi_class, Constant: 'ovr'
  classifier:liblinear_svc:penalty, Value: 'l2'
  classifier:liblinear_svc:tol, Value: 0.00013833036477206613
  data_preprocessing:categorical_transformer:categorical_encoding:__choice__, Value: 'one_hot_encoding'
  data_preprocessing:categorical_transformer:category_coalescence:__choice__, Value: 'no_coalescense'
 data_preprocessing:numerical_transformer:imputation:strategy, Value: 'median'
  data_preprocessing:numerical_transformer:rescaling:__choice__, Value: 'robust_scaler'
  data_preprocessing:numerical_transformer:rescaling:robust_scaler:q_max, Value: 0.9724376038343914
  data_preprocessing:numerical_transformer:rescaling:robust_scaler:q_min, Value: 0.24054446611700375
  feature_preprocessor:__choice__, Value: 'polynomial'
  feature_preprocessor:polynomial:degree, Value: 2
  feature_preprocessor:polynomial:include_bias, Value: 'False'
  feature_preprocessor:polynomial:interaction_only, Value: 'False'
mfeurer commented 3 years ago

Some updates. I started testing online whether the error on CI goes away by using the previously mentioned fix (https://github.com/automl/pynisher/pull/7, https://github.com/automl/auto-sklearn/pull/1022), but it turns out that the error is that the pynisher cannot start the subprocess at all. The error is:

 Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'accuracy' on <module '__main__' (built-in)>
WARNING: /home/runner/work/auto-sklearn/auto-sklearn/examples/40_advanced/example_metrics.py failed to execute correctly: Traceback (most recent call last):
  File "/home/runner/work/auto-sklearn/auto-sklearn/examples/40_advanced/example_metrics.py", line 107, in <module>
    cls.fit(X_train, y_train)
  File "/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/estimators.py", line 583, in fit
    super().fit(
  File "/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/estimators.py", line 348, in fit
    self.automl_.fit(load_models=self._load_models, **kwargs)
  File "/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/automl.py", line 1276, in fit
    return super().fit(
  File "/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/automl.py", line 584, in fit
    self._do_dummy_prediction(datamanager, num_run)
  File "/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/automl.py", line 412, in _do_dummy_prediction
    raise ValueError(
ValueError: Dummy prediction failed with run state StatusType.CRASHED and additional output: {'traceback': 'Traceback (most recent call last):\n  File "/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/evaluation/__init__.py", line 294, in run\n    obj(**obj_kwargs)\n  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pynisher/limit_function_call.py", line 287, in __call__\n    with open(os.path.join(tmp_dir.name, \'std.out\'), \'r\') as fh:\nFileNotFoundError: [Errno 2] No such file or directory: \'/tmp/tmpwgnpw8en/std.out\'\n', 'error': "FileNotFoundError(2, 'No such file or directory')"}.

Do you observe any similar output?

Also, if you add the fix, does the number of successful configurations change or does only the number of failing configurations reduce?

krzischp commented 3 years ago

Hi, I'm not observing the FileNotFoundError anymore but instead I'm geting the logs I commented before: Result queue is empty: 'error': 'Result queue is empty', 'exit_status': "<class 'pynisher.limit_function_call.AnythingException'>", 'subprocess_stdout': '', 'subprocess_stderr': '', 'exitcode': -11 like if there was no subprocess at all.

The number of failing configuration didn't change a lot. I'm still having more than 3 crashed configs almost every time I fit the model. Until now I only tried Auto-Sklearn on 3 datasets. So I cannot really confirm if nothing changed since I added your fix.

You managed to reproduce this issue reliably, right? If not, feel free to give me some debuging code to add. Pynisher failed in ensemble_builder module, right? Is your unit test (test_ensemble?) still catching the error after you added the fix in limit_function_call?

franchuterivera commented 3 years ago

Hello, We want to test a theory that there might be a collision in the directory structure causing this problem. We created a version of Auto-sklearn/pynisher that isolates the stdout/stderr files to each job. Sadly, I cannot also reproduce this problem on my laptop (so it will help us if you could try this out).

Can you please install and run using:

pip install  git+https://github.com/franchuterivera/pynisher.git@temporary_dict
pip install  git+https://github.com/franchuterivera/auto-sklearn.git@temporary_dict

Can you also run with delete_tmp_folder_after_terminate set to False, so we can check on the main log for errors (you can also specify a folder for this log file via the tmp_folder flag)?

mfeurer commented 3 years ago

@franchuterivera was able to reproduce and we're further investigating a fix in https://github.com/automl/pynisher/pull/10

krzischp commented 3 years ago

Hello, We want to test a theory that there might be a collision in the directory structure causing this problem. We created a version of Auto-sklearn/pynisher that isolates the stdout/stderr files to each job. Sadly, I cannot also reproduce this problem on my laptop (so it will help us if you could try this out).

Can you please install and run using:

pip install  git+https://github.com/franchuterivera/pynisher.git@temporary_dict
pip install  git+https://github.com/franchuterivera/auto-sklearn.git@temporary_dict

Can you also run with delete_tmp_folder_after_terminate set to False, so we can check on the main log for errors (you can also specify a folder for this log file via the tmp_folder flag)?

Really sorry, I was busy with other stuffs and I forgot to test this...

But great that you could reproduce the issue on your laptop!
Do you already know when this bugfix is going to be released?

mfeurer commented 3 years ago

I just pushed pynisher 0.6.4 to pypi. Could you please check whether it works for you?

wbjorndahl commented 3 years ago

I just pushed pynisher 0.6.4 to pypi. Could you please check whether it works for you?

Hi,

I had same error as the original post. Upgrading pynisher to 0.6.4 resulted in error found here: https://github.com/automl/auto-sklearn/issues/1066

I fixed that error by upgrading to auto-sklearn 0.12.3 (originally on 0.12.1) sudo pip3 install auto-sklearn --upgrade

OS: MacOS Installation in a conda environment Python version : 3.8 Auto-sklearn version: 0.12.1

mfeurer commented 3 years ago

Thanks for reporting this @wmbjo. @franchuterivera do you see any connection between the two issues?

krzischp commented 3 years ago

Hi, I tested on the same dataset with the Auto-Sklearn 0.12.3 and pynisher 0.6.4. I don't have the StatusType.crashed error anymore!

Thank you very much!
I will close the issue after my comment.