OPM / opm-common

Common components for OPM, in particular build system (cmake).
http://www.opm-project.org
GNU General Public License v3.0
33 stars 111 forks source link

Issue with output during ensemble simulations #1557

Open totto82 opened 4 years ago

totto82 commented 4 years ago

A colleague of mine is running Flow for history matching on a server with 72 cores. He runs 72 realizations at the same time with non-unified output. i.e. many output files are created very often. Some of the simulations produce this error message.

EclOutput.cpp:162: fstream fileH not open for writing.

and many empty summary and restart files.

Any clue?

joakim-hove commented 4 years ago

Well; it means that openstatement here has failed - exactly why that has failed I do not know. Would need to print some error messages from the open call; operating system / NSF / fd exhaustion / ????

Some of the simulations produce this error message.

Errors encountered in output are to a large extent "masked" by "catch all" somewhere; the situation you have encountered should in my opinion have produced an immediate exit(1).

blattms commented 4 years ago

Well there is usually a limit to the number of open files on a linux system. Maybe this is reached and opening another one fails therefore?

BTW: I could not find a place where we actually close EclipseOutput::ofileH but might just have missed it. Maybe we should close it?

bska commented 4 years ago

I could not find a place where we actually close EclipseOutput::ofileH

The stream is implicitly closed when destroying the data member as part of running the EclipseOutput::~EclipseOutput() destructor. That, in turn, happens at simulation shut-down and when we reassign the output stream (i.e., when opening a new stream) here. Of course, as of C++17 we're guaranteed that the right-hand side of the

this->stream_ = /* ... */;

assignment is evaluated/sequenced before the assignment itself which means that we open the new stream/file before closing the previous. If we're particularly unlucky we could end up in case where all—or a large majority of—the ensemble simulations attempt to create a new summary output stream at the same time which would increase the risk of exhausting the open file limits.

As a quick check it would be interesting to see if the problem is abated by inserting

this->stream_.reset(); // Close existing stream, flush output

before opening a new stream using

this->stream_ = /* ... */;