easybuilders / easybuild-framework

EasyBuild is a software installation framework in Python that allows you to install software in a structured and robust way.
https://easybuild.io
GNU General Public License v2.0
147 stars 199 forks source link

EasyBuild may loop forever when out of disk space #3531

Open zao opened 3 years ago

zao commented 3 years ago

Building R/4.0.3 on several machines I ran into two distinct failures when running out of space in /tmp where my build directory was pointed.

In one scenario, an infinite series of this exception was printed:

IOError: [Errno 28] No space left on device
Logged from file filetools.py, line 1688
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 892, in emit
    self.flush()
  File "/usr/lib/python2.7/logging/__init__.py", line 852, in flush
    self.stream.flush()

On the other one, two exceptions alternated:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/__init__.py", line 1085, in emit
    self.flush()
  File "/usr/lib/python3.8/logging/__init__.py", line 1065, in flush
    self.stream.flush()
OSError: [Errno 28] No space left on device
Call stack:
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/eb/develop/easybuild-framework/easybuild/main.py", line 540, in <module>
    main()
  File "/eb/develop/easybuild-framework/easybuild/main.py", line 507, in main
    ecs_with_res = build_and_install_software(ordered_ecs, init_session_state, exit_on_failure=exit_on_failure)
  File "/eb/develop/easybuild-framework/easybuild/main.py", line 117, in build_and_install_software
    (ec_res['success'], app_log, err) = build_and_install_one(ec, init_env)
  File "/eb/develop/easybuild-framework/easybuild/framework/easyblock.py", line 3330, in build_and_install_one
    result = app.run_all_steps(run_test_cases=run_test_cases)
  File "/eb/develop/easybuild-framework/easybuild/framework/easyblock.py", line 3235, in run_all_steps
    remove_lock(lock_name)
  File "/eb/develop/easybuild-framework/easybuild/tools/filetools.py", line 1688, in remove_lock
    _log.info("Lock removed: %s", lock_path)
Message: 'Lock removed: %s'
Arguments: ('/eb/software/.locks/_eb_software_R_4.0.3-foss-2020b.lock',)
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/__init__.py", line 1085, in emit
    self.flush()
  File "/usr/lib/python3.8/logging/__init__.py", line 1065, in flush
    self.stream.flush()
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/eb/develop/easybuild-framework/easybuild/framework/easyblock.py", line 3229, in run_all_steps
    self.run_step(step_name, step_methods)
  File "/eb/develop/easybuild-framework/easybuild/framework/easyblock.py", line 3084, in run_step
    step_method(self)()
  File "/eb/develop/easybuild-framework/easybuild/framework/easyblock.py", line 2244, in extensions_step
    inst = cls(self, ext)
  File "/eb/develop/easybuild-easyblocks/easybuild/easyblocks/generic/rpackage.py", line 83, in __init__
    super(RPackage, self).__init__(*args, **kwargs)
  File "/eb/develop/easybuild-framework/easybuild/framework/extensioneasyblock.py", line 79, in __init__
    Extension.__init__(self, *args, **kwargs)
  File "/eb/develop/easybuild-framework/easybuild/framework/extension.py", line 99, in __init__
    self.cfg = self.master.cfg.copy(validate=False)
  File "/eb/develop/easybuild-framework/easybuild/framework/easyconfig/easyconfig.py", line 567, in copy
    ec = EasyConfig(self.path, validate=validate, hidden=self.hidden, rawtxt=self.rawtxt)
  File "/eb/develop/easybuild-framework/easybuild/framework/easyconfig/easyconfig.py", line 496, in __init__
    self.parse()
  File "/eb/develop/easybuild-framework/easybuild/framework/easyconfig/easyconfig.py", line 727, in parse
    self._finalize_dependencies()
  File "/eb/develop/easybuild-framework/easybuild/framework/easyconfig/easyconfig.py", line 1582, in _finalize_dependencies
    tc = robot_find_subtoolchain_for_dep(dep, self.modules_tool)                                                                                                                                                                 File "/eb/develop/easybuild-framework/easybuild/framework/easyconfig/easyconfig.py", line 2204, in robot_find_subtoolchain_for_dep
    mod_name = ActiveMNS().det_full_module_name(newdep, require_result=False)
  File "/eb/develop/easybuild-framework/easybuild/framework/easyconfig/easyconfig.py", line 2536, in det_full_module_name
    mod_name = self._det_module_name_with(self.mns.det_full_module_name, ec, force_visible=force_visible,
  File "/eb/develop/easybuild-framework/easybuild/framework/easyconfig/easyconfig.py", line 2509, in _det_module_name_with
    mod_name = mns_method(ec)
  File "/eb/develop/easybuild-framework/easybuild/tools/module_naming_scheme/hierarchical_mns.py", line 89, in det_full_module_name
    return os.path.join(self.det_module_subdir(ec), self.det_short_module_name(ec))
  File "/eb/develop/easybuild-framework/easybuild/tools/module_naming_scheme/hierarchical_mns.py", line 150, in det_module_subdir
    tc_mpi = det_toolchain_mpi(ec)
  File "/eb/develop/easybuild-framework/easybuild/tools/module_naming_scheme/toolchain.py", line 145, in det_toolchain_mpi                                                                                                         tc_elems = ec.toolchain.definition()
  File "/eb/develop/easybuild-framework/easybuild/tools/toolchain/toolchain.py", line 584, in definition
    self.log.debug("Toolchain definition for %s: %s", self.as_dict(), tc_elems)
  File "/usr/lib/python3.8/logging/__init__.py", line 1422, in debug
    self._log(DEBUG, msg, args, **kwargs)
  File "/usr/lib/python3.8/logging/__init__.py", line 1577, in _log                                                                                                                                                                self.handle(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1587, in handle                                                                                                                                                              self.callHandlers(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1649, in callHandlers                                                                                                                                                        hdlr.handle(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 950, in handle                                                                                                                                                               self.emit(record)
  File "/usr/lib/python3.8/logging/handlers.py", line 71, in emit                                                                                                                                                                  logging.FileHandler.emit(self, record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1183, in emit                                                                                                                                                                StreamHandler.emit(self, record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1089, in emit
    self.handleError(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1012, in handleError                                                                                                                                                         traceback.print_stack(frame, file=sys.stderr)
  File "/usr/lib/python3.8/traceback.py", line 190, in print_stack                                                                                                                                                                 print_list(extract_stack(f, limit=limit), file=file)
  File "/usr/lib/python3.8/traceback.py", line 211, in extract_stack                                                                                                                                                               stack = StackSummary.extract(walk_stack(f), limit=limit)
  File "/usr/lib/python3.8/traceback.py", line 362, in extract                                                                                                                                                                     linecache.checkcache(filename)
  File "/usr/lib/python3.8/linecache.py", line 74, in checkcache                                                                                                                                                                   stat = os.stat(fullname)
  File "/eb/develop/easybuild-framework/easybuild/tools/filetools.py", line 1710, in clean_up_locks_signal_handler                                                                                                                 raise KeyboardInterrupt("keyboard interrupt")
KeyboardInterrupt: keyboard interrupt

I'm not sure if the latter happened after interrupting what was endlessly scrolling on the machine as my scrollback is too short and I have unfortunately cleared everything EB from /tmp.

zao commented 3 years ago

This was running develop EB from Dec 10th (5d8747ac3bb4374fb32988881f3d59cb3fd3fd3e) on two different OSes, the first with Python/2.7 is Ubuntu 18.04 and the second with Python/3.8 is Ubuntu 20.04.