automl / HPOBenchExperimentUtils

Experiment code to run large-scale experimente with HPOBench
Apache License 2.0
7 stars 5 forks source link

OSError: [Errno 39] Directory not empty: 'attribute_lock' #27

Closed KEggensperger closed 3 years ago

KEggensperger commented 3 years ago

When running autogluon on some benchmarks, at the end of the optimization procedure (unfortunately before rewriting the trajectory) there happens the following error:

See also the complete log here: run_NAS1SHOT1_autogluon_32_errlog.txt run_NAS1SHOT1_autogluon_32.cmd_out.txt

@PhMueller: Can we fix this or safely try/except this error since the optimization completed?

[INFO] autogluon.core.searcher.bayesopt.tuning_algorithms.bo_algorithm at 2021-03-21 16:57:35,315 --- BO Algorithm: Selecting final set of candidates.
Exception ignored in: <function Bookkeeper.__del__ at 0x7fb3a8c30dd0>                                                                                                                                       
Traceback (most recent call last):
  File "/home/eggenspk/2020_Hpolib2/HPOBenchExperimentUtils/HPOBenchExperimentUtils/core/bookkeeper.py", line 328, in __del__    
    shutil.rmtree(self.lock_dir)
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/shutil.py", line 494, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/shutil.py", line 436, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())                                                       
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/shutil.py", line 434, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)                                                                
OSError: [Errno 39] Directory not empty: 'attribute_lock'                             
Exception ignored in: <function Bookkeeper.__del__ at 0x7fb3a8c30dd0>                                                                                                                                       
Traceback (most recent call last):
  File "/home/eggenspk/2020_Hpolib2/HPOBenchExperimentUtils/HPOBenchExperimentUtils/core/bookkeeper.py", line 328, in __del__
    shutil.rmtree(self.lock_dir)
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/shutil.py", line 498, in rmtree                                
    onerror(os.rmdir, path, sys.exc_info())
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/shutil.py", line 496, in rmtree                               
    os.rmdir(path) 
OSError: [Errno 39] Directory not empty: '/home/eggenspk/2020_Hpolib2/HPOBenchExperimentUtils/exp_outputs/NASBench1shot1SearchSpace1Benchmark/autogluon/run-1/lock_dir'
[ERROR] autogluon.core.scheduler.hyperband at 2021-03-21 16:57:58,288 --- Traceback (most recent call last):
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/multiprocessing/managers.py", line 811, in _callmethod                                                                              
    conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'  

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/site-packages/autogluon/core/utils/custom_process.py", line 16, in run
    mp.Process.run(self)
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/site-packages/autogluon/core/scheduler/scheduler.py", line 157, in _worker
    ret = fn(**args)
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/site-packages/autogluon/core/decorator.py", line 60, in __call__
    output = self.f(args, **kwargs)
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/site-packages/autogluon/core/decorator.py", line 143, in wrapper_call
    return func(*args, **kwargs)
  File "/home/eggenspk/2020_Hpolib2/HPOBenchExperimentUtils/HPOBenchExperimentUtils/optimizer/autogluon_optimizer.py", line 150, in objective_function
    **self.settings_for_sending)
  File "/home/eggenspk/2020_Hpolib2/HPOBenchExperimentUtils/HPOBenchExperimentUtils/core/bookkeeper.py", line 40, in wrapped
    self.increase_total_tae_used(1)
  File "/home/eggenspk/2020_Hpolib2/HPOBenchExperimentUtils/HPOBenchExperimentUtils/core/bookkeeper.py", line 290, in increase_total_tae_used
    self.total_tae_calls_proxy.value = self.total_tae_calls_proxy.value + total_tae_used
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/multiprocessing/managers.py", line 1138, in get
    return self._callmethod('get')
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/multiprocessing/managers.py", line 815, in _callmethod
    self._connect()
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/multiprocessing/managers.py", line 802, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/multiprocessing/connection.py", line 492, in Client
    c = SocketClient(address)
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/multiprocessing/connection.py", line 620, in SocketClient
    s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory
NoneType: None
Traceback (most recent call last):
  File ".//HPOBenchExperimentUtils/run_benchmark.py", line 195, in <module>
    run_benchmark(**vars(args), **benchmark_params) 
  File ".//HPOBenchExperimentUtils/run_benchmark.py", line 157, in run_benchmark
    and not tae_exceeds_limit(benchmark.get_total_tae_used(), settings['tae_limit']) \
  File "/home/eggenspk/2020_Hpolib2/HPOBenchExperimentUtils/HPOBenchExperimentUtils/core/bookkeeper.py", line 251, in get_total_tae_used
    with lock:
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/site-packages/oslo_concurrency/lockutils.py", line 270, in lock
    ext_lock.acquire(delay=delay)
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/site-packages/fasteners/process_lock.py", line 156, in acquire
    self._do_open()
  File "/home/eggenspk/miniconda3CLUSTER/envs/hpobench_37/lib/python3.7/site-packages/fasteners/process_lock.py", line 128, in _do_open
    self.lockfile = open(self.path, 'a')
FileNotFoundError: [Errno 2] No such file or directory: b'/home/eggenspk/2020_Hpolib2/HPOBenchExperimentUtils/exp_outputs/NASBench1shot1SearchSpace1Benchmark/autogluon/run-1/lock_dir/attribute_lock/attrib
ute_lock'