equinor / ert

ERT - Ensemble based Reservoir Tool - is designed for running ensembles of dynamical models such as reservoir models, in order to do sensitivity analysis and data assimilation. ERT supports data assimilation using the Ensemble Smoother (ES), Ensemble Smoother with Multiple Data Assimilation (ES-MDA) and Iterative Ensemble Smoother (IES).
https://ert.readthedocs.io/en/latest/
GNU General Public License v3.0
103 stars 107 forks source link

Realization marked as failed, but all fm steps completed (2024.04) #7715

Closed larsevj closed 5 months ago

larsevj commented 7 months ago

Ran a drogon case, and on iteration 3; four (4) realizations were marked as failed in the GUI, but all forward model steps were marked as success and OK file was written. The following error message was found in the logs:

status from done callback: Error reading GEN_DATA: R_A2_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A2_1']
Error reading GEN_DATA: R_A3_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A3_1']
Error reading GEN_DATA: R_A4_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A4_1']
Error reading GEN_DATA: R_A5_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A5_1']
Error reading GEN_DATA: R_A6_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/RFT_R_A6_1']
Error reading GEN_DATA: TRACER_SIM, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/tracer/drogon_tracer_sim_1.txt']
Error reading GEN_DATA: AMP_2020_2018_TOP, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/share/results/points/topvolantis_amplitude_mean_20200701_20180101_1.txt']
Error reading GEN_DATA: AMP_2020_2018_BASE, errors: ['Missing output file: /scratch/fmu/levje/01_drogon_ahm_test_license/realization-19/iter-3/share/results/points/basevolantis_amplitude_mean_20200701_20180101_1.txt']
ERROR    Realization: 60 failed after reaching max submit (1):

To reproduce Steps to reproduce the behaviour:

  1. pip install ert
  2. ert gui my_config.ert
  3. Run experiment (IES/Smoother/ESMDA/Test)

Expected behaviour A clear and concise description of what you expected to happen.

Environment

larsevj commented 7 months ago

Error seen in ert-internal examples as well on building the 2024.04.04 release: https://github.com/equinor/komodo-releases/actions/runs/8755056479/job/24055171964

sondreso commented 7 months ago

We need to check if the file is on disk, and if it is we need to reconsider if there should be a slight wait in the callback to allow disk synchronisation.

larsevj commented 7 months ago

In the case of ert-internal-examples the file does seem to be on disk:

cat RFT_RWI_3_1
271.8949890136719
268.4920349121094
275.8153991699219
eivindjahren commented 6 months ago

Would be solved by #7788