equinor / ert

ERT - Ensemble based Reservoir Tool - is designed for running ensembles of dynamical models such as reservoir models, in order to do sensitivity analysis and data assimilation. ERT supports data assimilation using the Ensemble Smoother (ES), Ensemble Smoother with Multiple Data Assimilation (ES-MDA) and Iterative Ensemble Smoother (IES).
https://ert.readthedocs.io/en/latest/
GNU General Public License v3.0
101 stars 104 forks source link

Acquiring LSF jobid is flaky when submitting the job #8089

Closed xjules closed 3 months ago

xjules commented 3 months ago

When fetching the jobid after the bsub has completed we've got way too many entries in the log:

Exception in scheduler task job-7_task: Could not understand '' from bsub
Traceback: Traceback (most recent call last):
  File "/opt/rh/rh-python38/root/usr/lib64/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/prog/res/komodo/2024.06.00-py38-rhel7/root/lib64/python3.8/site-package
.... went into
raise RuntimeError(f"Could not understand '{process_message}' from bsub")

On the first glance, it looks like to bsub returned exit code 0 while the stdout was empty. It might happen occasionally but in the span of a few hours when scheduler was used in stable hundreds of entries need some investigation.

sondreso commented 3 months ago

Needs to be backported as well

xjules commented 3 months ago

Needs to be backported as well

Backported to ert==10.1