jakobtorben commented 2 months ago

Implement Convergence Monitoring

This PR introduces a new convergence monitoring feature to improve the robustness and efficiency of the simulator. It is based on the following publication:

Lie, K., Moyner, O., Klemetsdal, Ø., Skaflestad, B., Moncorgé, A., & Kippe, V. (2024). Enhancing Performance of Complex Reservoir Models via Convergence Monitors. ECMOR, 2024(1), 1-9. (https://doi.org/10.3997/2214-4609.202437057)

The convergence monitoring system tracks the convergence behaviour across iterations, applying penalties for non-convergence. If the total penalty count exceeds the specified cut-off limit, the simulator will cut the timestep.

This feature allows early-exiting for steps that are not converging, saving wasted iterations and assembly.

This is the first version that will be iterated on before considering to merge it.

jakobtorben commented 2 months ago

This first version implements the following convergence monitors:

Distance decay: Define the distance from convergence as a vector of d_i = max(log(r_i ), 0), for each of the convergence metrics of the reservoir. Calculate the L1 norm of the distance vector, and add add a penalty card if the current distance norm is greater than the previous distance norm multiplied by some decay factor (default 0.75): d^k > σ d^k−1.
Degradation of reservoir metrics: Add a penalty for each of the metrics (here CNV and MB) that have increased from the previous iteration
Unconverged wells: Add a penalty card if there are unconverged wells.

If the total penalty cards if above a given cut-off limit (default 30), cut the timestep.

I tested this on Norne, where we observe a slight decrease in nonlinear and linear iterations. But other cases are probably more suited since Norne does not fail a lot to begin with. (Ignore the zero wasted, I am not exiting the timestep cut gracefully yet).

jakobtorben commented 2 months ago

In the second version, the implementation should be the same as the paper, given that the tolerance for adding a card for too large well residual, is the same as OPM default such that the well is unconverged. The reporting has also been fixed, such that the wasted iterations coming from the convergence monitoring is counted. (This fix also involved making sure that failed iterations from NaN and too large residuals errors are counted).

When tested on Norne, the results are currently similar to without using convergence monitoring

jakobtorben commented 2 months ago

Fixed some bugs and added the penalty counts to the INFOITER file for analysis.

The INFOITER file was used to analyse the convergence behaviour and cut-off values, using a tool similar to the paper.

Norne_step_275

Using this tool we can also estimate the number of iterations saved if using convergence monitoring. Which can be used to find the optimal parameters to use for a specific case:

$NORNE_fraction_of_iterations_remaining_after_early_exit$

And number of incorrectly aborted timesteps:

NORNE_number_of_incorrectly_aborted_timesteps

Optimal parameters found at cut-off 14 and distance decay factor 0.60, which should give an estimated number of Newton iterations as a factor of 0.989. The small reduction is likely due to Norne not failing a lot to begin with for OPM.

Using these optimal parameters, we can run OPM with convergence monitoring to see if we get any improvements. Here run with relaxed CNV and MB tol equal to original tol to match the format used in the analysis tool.

From the results, we see a small reduction in Newton iterations and runtime, as expected from our analysis. Better results are likely achieved on cases with more failed timesteps.

jakobtorben commented 1 month ago

Depends on https://github.com/OPM/opm-common/pull/4244.

jakobtorben commented 1 month ago

Note that the well convergence metric is not used at the moment. But the plan is to also include well convergence metrics in the convergence monitoring. However, this requires two things:

WellConvergenceMetric must to be extended to multi-segment wells.
Some logic needs to be added to deal with the fact that number of wells increases during the simulation, which needs to be dealt with when comparing the number of unconverged residuals to the previous iteration.

jakobtorben commented 1 month ago

jenkins build this opm-common=4244 please

jakobtorben commented 1 month ago

jenkins build this please

atgeirr commented 1 month ago

All good, all green! Merging.

OPM / opm-simulators

Convergence monitors #5590

Implement Convergence Monitoring