automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.63k stars 1.28k forks source link

Models never stop training with v 0.7.0 #871

Closed jdprusa closed 4 years ago

jdprusa commented 4 years ago

I have upgraded from v 0.6.0 to v 0.7.0, but this has led to a bug where the autosklearn estimator never stops training even on small toy datasets and with a maximum time set to 60 seconds.

my code to produce this behavior is as follows:

from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)

n_jobs = os.cpu_count()
mem_bytes = os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES')  
ml_memory_limit = int(mem_bytes/(1024.**2)*.8)

automl = autosklearn.classification.AutoSklearnClassifier(
                time_left_for_this_task=60,
                per_run_time_limit=20,
                resampling_strategy='cv',
                resampling_strategy_arguments={'folds': 5},
                n_jobs=n_jobs, 
                ml_memory_limit=ml_memory_limit,
            )

automl.fit(X, y)

With v0.6.0 this issue is not present. The autosklearn.estimator trains for the allocated time, then stops.

OS and Environment:

franchuterivera commented 4 years ago

Hello!, I tried to reproduce this (on a standalone .py and in the console), and I am not able to reproduce. I used both python v3.7.3 and v3.6.10 on the aforementioned Ubuntu version.

Time to fit the estimator 54.729639291763306

So I am wondering if it has to do with your number of jobs/memory per job assigned.

Can you clarify your your n_jobs and ml_memory_limit used above? BTW, ml_memory_limit specifies the memory per job, so please have that into account.

ouwyukha commented 4 years ago

Hello!, I tried to reproduce this (on a standalone .py and in the console), and I am not able to reproduce. I used both python v3.7.3 and v3.6.10 on the aforementioned Ubuntu version.

Time to fit the estimator 54.729639291763306

So I am wondering if it has to do with your number of jobs/memory per job assigned.

Can you clarify your your n_jobs and ml_memory_limit used above? BTW, ml_memory_limit specifies the memory per job, so please have that into account.

Hi, i got same problem too on WSL Ubuntu 18.04 python 3.7.7, there's no error/warning shown on installation. I did try to run the code (classification, iris dataset, limit 180sec) on google colab and it works fine. I don't set any n_jobs and ml_memory_limit, so it should be on default.. the only warning shown while training is [WARNING] [2020-06-16 09:49:47,196:smac.runhistory.runhistory2epm.RunHistory2EPM4LogCost] Got cost of smaller/equal to 0. Replace by 0.000010 since we use log cost.

by the way, is there any option to print verbose on console instead of logfile? thank you

franchuterivera commented 4 years ago

Hello, for the custom logging configuration, you can refer to the argument logging_config of the estimator (More information on this in the API https://automl.github.io/auto-sklearn/master/api.html)

Regarding the run not honoring the user time specification: I am not able to reproduce under the following conditions:

lsb_release  -a
Ubuntu 18.04.4 LTS
python --version 
Python 3.7.7

With the code:

from sklearn.datasets import load_iris
import time
import os
import autosklearn.classification

X, y = load_iris(return_X_y=True)

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=180,
    delete_tmp_folder_after_terminate=False,
    tmp_folder='debug_folder',
)

start = time.time()
automl.fit(X, y)
end = time.time()
print(f" Time to fit the estimator {end - start}") --> prints  Time to fit the estimator 175.0197412967682

With the above flags, in a folder called 'debug_folder' you should find a log file with debug information of the run.

It will be helpful to us if you could diff the log file of your google colab vs your WSL crashed run to see if this highlights an unexpected execution. So far, we are not able to reproduce this problem. Thanks!

ouwyukha commented 4 years ago

Here are the log files, i notice 0 prediction created even though there's 27 models on .auto-sklearn folder Done reading 0 new prediction files. Loaded 27 predictions in total. on wsl

WSL_IRIS.log COLAB_IRIS.log

by the way, i think this problem exist too on auto-sklearn 0.6.0. log file shown same message and my jupyter console return warning message every 2 seconds [WARNING] [2020-06-17 12:35:02,463:EnsembleBuilder(1):a542df78d9166e53c1f38890b5b364dc] No models better than random - using Dummy Score! this happen on my custom dataset which has 5m rows x 157 features (down sampling up to 250k and 57 features won't help)

UPDATE : So, i just try to reinstall everything including SMAC by following this guide : https://blog.csdn.net/potun7890/article/details/106350912 , my autosklearn 0.7 able to finished his job 600sec later 7

but the predictors and ensemblers still have no prediction 0.7.log

mfeurer commented 4 years ago

Hey @ouwyukha that second problem appeears to be related to #764. Please try using a lower per_run_time_limit than the total time limit and also make sure to print some additional info as displayed here.

Regarding the bug about Auto-sklearn not stopping. Could everyone who encounters this problem please stop Auto-sklearn with a keyboard interrupt (CTRL+C) and paste the output so that we know where it was hanging?

ouwyukha commented 4 years ago

Hey @ouwyukha that second problem appeears to be related to #764. Please try using a lower per_run_time_limit than the total time limit and also make sure to print some additional info as displayed here.

Thanks for advice, mine works fine now..

Regarding the bug about Auto-sklearn not stopping. Could everyone who encounters this problem please stop Auto-sklearn with a keyboard interrupt (CTRL+C) and paste the output so that we know where it was hanging?

Fortunately, after creating new environment i couldn't reproduce this problem anymore, sorry.. it might be dependency problem

mfeurer commented 4 years ago

@jdprusa is this still relevant to you? If yes, could you please paste the output after making a keyboard interrupt?

mfeurer commented 4 years ago

Closing this issue as it stalled, we cannot reproduce it, and we have since released two new versions of Auto-sklearn. Please re-open or add a comment if this problem still occurs with auto-sklearn 0.9 or higher.

whoisltd commented 1 year ago

i still have this problem on v0.15.0 @mfeurer

After Ctrl + C:

[WARNING] [2023-09-19 00:32:58,141:Client-AutoMLSMBO(1)::65e4915d-5649-11ee-9b5f-bb3dfebd4d23] Configuration 452 not found
[WARNING] [2023-09-19 00:32:58,141:Client-AutoMLSMBO(1)::65e4915d-5649-11ee-9b5f-bb3dfebd4d23] Configuration 102 not found
[WARNING] [2023-09-19 00:32:58,141:Client-AutoMLSMBO(1)::65e4915d-5649-11ee-9b5f-bb3dfebd4d23] Configuration 120 not found
[WARNING] [2023-09-19 00:32:58,141:Client-AutoMLSMBO(1)::65e4915d-5649-11ee-9b5f-bb3dfebd4d23] Configuration 150 not found
^CProcess ForkProcess-19:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/media/data/works/ml_pipeline/venv/lib/python3.10/site-packages/autosklearn/util/logging_.py", line 317, in start_log_server
    receiver.serve_until_stopped()
  File "/media/data/works/ml_pipeline/venv/lib/python3.10/site-packages/autosklearn/util/logging_.py", line 347, in serve_until_stopped
    rd, wr, ex = select.select([self.socket.fileno()], [], [], self.timeout)
KeyboardInterrupt