equinor / fmu-ensemble

Python objectification of reservoir model ensembles left on disk by ERT.
GNU General Public License v3.0
12 stars 19 forks source link

Summary loading crashes on mangling smspec data #251

Open berland opened 3 months ago

berland commented 3 months ago

The lines in https://github.com/equinor/fmu- ensemble/blob/af1bc17a7b60b8e8ec0808412eaade73136fac8e/src/fmu/ensemble/realization.py#L983-L990 is supposed to catch summary loading failures, but if resdata exits with util_abort() then nothing can be done, it will not be caught by IOError.

For realizations where Eclipse has crashed with a License error, the SMSPEC files and UNSMRY files are left in a state where it will trigger an util_abort(). This situation is unfortunately common enough that this situation should be catched. fmu_ensemble must then detect this situation before it happens to avoid calling the C-code that calls util_abort.

Example of error in this scenario (reproduced with res2csv):

$ res2csv summary SOMETHING_WITH_LICENSE_ERROR.DATA

Error message: rd_smspec_fread_alloc: Sorry the SMSPEC file seems to lack all time information, need either TIME, or DAY/MONTH/YEAR information. Can not proceed.
See file: /tmp/ert_abort_dump.havb.20240510-092807.log for more details of the crash.
Setting the environment variable "ERT_SHOW_BACKTRACE" will show the backtrace on stderr.
Aborted (core dumped)
berland commented 3 months ago

One way to detect this scenario that will trigger the crash can be done in Python with:

import resfo
smspec = resfo.read("SOMETHING_WITH_LICENSE_ERROR.SMSPEC")
if not len([val[1] for val in a if val[0] == "KEYWORDS"][0]):
    print(f"There is no TIME information in the SMSPEC file: {smspec_filename}"

@eivindjahren might have other ideas?

alifbe commented 3 months ago

Can't we fixed it from resdata and make it failed silently just like when it failed to read smspec file? Technically it's the same issue.

Then we don't need to fix it in all tools that using resdata

https://github.com/equinor/resdata/blob/55a2fe6ed3a1c562532c18641e965724d28ec81a/lib/resdata/rd_smspec.cpp#L1259-L1261

eivindjahren commented 3 months ago

@alifbe Probably a good idea to make resdata not use util_abort but signal with exception. However, it was never designed to do that so it can be complicated as it needs to clean up before signaling an error.

The code in question is this https://github.com/eivindjahren/ecl/blob/8f443d1b2771239c9661b98c60769956ef6c0c8c/lib/resdata/rd_smspec.cpp#L1211-L1264

if it returns NULL then rd_sum_fread will return false: https://github.com/eivindjahren/ecl/blob/8f443d1b2771239c9661b98c60769956ef6c0c8c/lib/resdata/rd_sum.cpp#L190-L197

And then rd_sum_fread_case will return false: https://github.com/eivindjahren/ecl/blob/8f443d1b2771239c9661b98c60769956ef6c0c8c/lib/resdata/rd_sum.cpp#L221-L238

And rd_sum_fread_alloc_case2__ will return NULL: https://github.com/eivindjahren/ecl/blob/8f443d1b2771239c9661b98c60769956ef6c0c8c/lib/resdata/rd_sum.cpp#L439-L457

which ends with an excption in rd_sum.py: https://github.com/eivindjahren/ecl/blob/8f443d1b2771239c9661b98c60769956ef6c0c8c/python/resdata/summary/rd_sum.py#L248-L254

I think it makes sense to make the signaling method for resdata.Summary consistent. Recovering from the abort signal was never a part of the public API so replacing it with a recoverable error is not a breaking change, but it is a minor version increment.