Write test for summary handling with uneven time-scale lengths

berland commented 4 years ago

There should be a test setup to verify correctness when one realization has failed, say in the middle of the schedule period. Perturb a schedule file to end prematurely, run it, and save the UNSMRY file with a different filename that can be injected temporarily by the test-code.

Test that we can filter out by date the realization that failed.

load_smry() on that realization should not have any dates past the crash point, for any time_index argument.

get_smry() at ensemble level should be similar, differing last-date pr. realization.

The behaviour of get_smry_stats() is undefined. Uneven DATE ranges pr realization must be padded when realizations have not been excluded, as it is not given that a simulation that ends earlier represents an error and not intent. It is thinkable to add an option to get_smry_stats() to require similar end-date for all realizations.

Delta profiles should possibly exclude profiles with differing DATE columns.

asnyv commented 4 years ago

First: I don't think delta profiles should necessarily be required to have the same dates, e.g. when evaluating timing of drilling, prolonged lifetime and etc, different date columns could come into play. But it would be good to make sure that delta profiles are not calculated beyond the latest data point of a crashed realization.

Secondly: I am quite sure that the current behavior is a consequence of this (sorry for messy explanation):

ScratchEnsemble.get_smry() called with a frequency of e.g. 'monthly' will run the method _get_smry_dates, which in the case of undefined start and end dates will run the following lines of code (L1033 and 1024 in ensemble.py):

start_smry = min([min(x) for x in eclsumsdates])
end_smry = max([max(x) for x in eclsumsdates])

where eclsumdates is a list containing the date vectors for all the realizations, hence the start and end points become the earliest start and latest end in the ensemble. The new vector of time steps is then given to realization.get_smry which further calls self.get_eclsum(args).pandas_frame(time_index_arg, column_keys) and in ecl.summary it is clearly stated that if the time points in the time_index vector are outside the simulated range, rates will be returned as 0, while "not rate" will be returned as first or last simulated value (last the only relevant for the scenario of crashes). Interestingly ensemble.get_smry and ensemble.load_smry seem to have different behaviors: ensemble.get_smry will give the same time range for all realizations, while ensemble.load_smry will load only available dates for each realization. EnsembleCombination only has the get_smry, and the issue of unrealistic deltas in case of simulation crash therefore seems unavoidable unless you drop realizations first. Of course: by ensuring that all realizations have the same dates like get_smry does, the combination gets easier to calculate so that would have to be handled if get_smry switched to a load_smry-like behavior.

asnyv commented 4 years ago

Would it e.g. be possible to catch when an Eclipse run has stopped with errors (either from LOG/PRT-files, or e.g. if the number of ERRORS is written to UNSMRY and is > 0), and in that case avoid padding with further dates (and only calculate delta until the last written timestep before crash, alternatively skip it). And if the realizations seem to have finished properly we extrapolate like the current behavior (rates=0, others constant at last value)?

berland commented 4 years ago

If you filter on OK (that could/should be default in applications), wouldn't that provide this behavior?

asnyv commented 4 years ago

It will provide the behavior for the case that you skip completely if crashed, but not if you want to keep the data until time of crash. Could be that the runs that tend to crash also tend to gather in a specific part of your final statistics, in that case removing them completely could "hide" that for the general user if filtering is baked into the default settings of the application, resulting in a bias. While if it is visual the realizations have stopped at a an earlier stage, I would think it is probably easier to catch that bias for the user?

And even if filtering on OK would return the same results for ensemble.load_smry and ensemble.get_smry, I think it is a bit confusing that they do not return the same result. At least I thought before now that the only difference was supposed to be whether you wanted the data internalized or not? Might of course be me that misunderstood ;)

berland commented 4 years ago

Added issue #97

equinor / fmu-ensemble

Write test for summary handling with uneven time-scale lengths #94