Update second stage preconditioner for CPR instead of recreate

jakobtorben commented 6 days ago

When using two level preconditioners in OPM (CPR like), it consists of two parts:

Extracting the pressure system and applying an AMG cycle on the scalar system.
Apply a block preconditioner on the complete system.

The second part is typically an ILU0 preconditioner, where we must perform a factorisation based on the values of the matrix. When the matrix values changes (but not the sparsity), this factorisation needs to be recomputed.

At the moment, this ILU0 preconditioner was recreated every single Newton iteration, which involves recreating the memory and other optimisations. This is not necessary, and only an update is needed. Which is what this PR changes it to. This is especially important when we start using GPU preconditioners with CPR, as these GPU preconditioners typically involve expensive GPU allocations, matrix analyis and autotuning.

After this update the linear setup time should be reduced for all simulations, since this is the default preconditioner setup in OPM. Below is a comparison on SPE10 before and after this fix, which shows a reduction of 20 % for the linear setup time.

jakobtorben commented 6 days ago

jenkins build this please

akva2 commented 6 days ago

benchmark please

ytelses commented 2 days ago

Benchmark result overview:

Test	Configuration	Relative
opm-git	OPM Benchmark: drogon - Threads: 1	1.283
opm-git	OPM Benchmark: drogon - Threads: 8	1.014
opm-git	OPM Benchmark: punqs3 - Threads: 1	1.002
opm-git	OPM Benchmark: punqs3 - Threads: 8	0.993
opm-git	OPM Benchmark: smeaheia - Threads: 1	0.983
opm-git	OPM Benchmark: smeaheia - Threads: 8	0.997
opm-git	OPM Benchmark: spe10_model_1 - Threads: 1	0.996
opm-git	OPM Benchmark: spe10_model_1 - Threads: 8	0.961
opm-git	OPM Benchmark: flow_mpi_extra - Threads: 1	0.99
opm-git	OPM Benchmark: flow_mpi_extra - Threads: 8	1
opm-git	OPM Benchmark: flow_mpi_norne - Threads: 1	0.998
opm-git	OPM Benchmark: flow_mpi_norne - Threads: 8	0.996
opm-git	OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 1 - FOPT (Total Oil Production At End Of Run)	0.992
opm-git	OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8 - FOPT (Total Oil Production At End Of Run)	1.001

Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *

View result details @ https://www.ytelses.com/opm/?page=result&id=2652

atgeirr commented 1 day ago

jenkins build this serial please

blattms commented 1 day ago

I have one request. Would we please also post performance benchmarks for more relevant test case than SPE10.

jakobtorben commented 13 hours ago

Yes sure, I was hoping that the benchmark would do this for me but it gave quite mixed results.

Here is Norne

and here is Sleipner

both with defaults in OPM (CPRW with ILU0).

Here you can see a reduction in the linear setup time for both cases. I struggled to get consistent good reduction in overall simulation time as other parts would fluctuate. But I think not recreaing the preconditioner is never a bad thing. With the ILU0 in OPM there isn't that much difference in update and construction, mostly just extra memory allocation (there is probably some optimisations we can do in this direction)

For me this change is most important for other preconditioners where there is a big difference in what we do on construction and update. Such as GPUILU0 and GPUDILU, where we do expensive memory allocation and autotuning in the construction.

OPM / opm-simulators

Update second stage preconditioner for CPR instead of recreate #5758