Closed jakobtorben closed 1 day ago
jenkins build this please
benchmark please
Benchmark result overview:
Test | Configuration | Relative |
---|---|---|
opm-git | OPM Benchmark: drogon - Threads: 1 | 1.283 |
opm-git | OPM Benchmark: drogon - Threads: 8 | 1.014 |
opm-git | OPM Benchmark: punqs3 - Threads: 1 | 1.002 |
opm-git | OPM Benchmark: punqs3 - Threads: 8 | 0.993 |
opm-git | OPM Benchmark: smeaheia - Threads: 1 | 0.983 |
opm-git | OPM Benchmark: smeaheia - Threads: 8 | 0.997 |
opm-git | OPM Benchmark: spe10_model_1 - Threads: 1 | 0.996 |
opm-git | OPM Benchmark: spe10_model_1 - Threads: 8 | 0.961 |
opm-git | OPM Benchmark: flow_mpi_extra - Threads: 1 | 0.99 |
opm-git | OPM Benchmark: flow_mpi_extra - Threads: 8 | 1 |
opm-git | OPM Benchmark: flow_mpi_norne - Threads: 1 | 0.998 |
opm-git | OPM Benchmark: flow_mpi_norne - Threads: 8 | 0.996 |
opm-git | OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 1 - FOPT (Total Oil Production At End Of Run) | 0.992 |
opm-git | OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8 - FOPT (Total Oil Production At End Of Run) | 1.001 |
View result details @ https://www.ytelses.com/opm/?page=result&id=2652
jenkins build this serial please
I have one request. Would we please also post performance benchmarks for more relevant test case than SPE10.
Yes sure, I was hoping that the benchmark would do this for me but it gave quite mixed results.
Here is Norne
and here is Sleipner
both with defaults in OPM (CPRW with ILU0).
Here you can see a reduction in the linear setup time for both cases. I struggled to get consistent good reduction in overall simulation time as other parts would fluctuate. But I think not recreaing the preconditioner is never a bad thing. With the ILU0 in OPM there isn't that much difference in update and construction, mostly just extra memory allocation (there is probably some optimisations we can do in this direction)
For me this change is most important for other preconditioners where there is a big difference in what we do on construction and update. Such as GPUILU0 and GPUDILU, where we do expensive memory allocation and autotuning in the construction.
When using two level preconditioners in OPM (CPR like), it consists of two parts:
The second part is typically an ILU0 preconditioner, where we must perform a factorisation based on the values of the matrix. When the matrix values changes (but not the sparsity), this factorisation needs to be recomputed.
At the moment, this ILU0 preconditioner was recreated every single Newton iteration, which involves recreating the memory and other optimisations. This is not necessary, and only an update is needed. Which is what this PR changes it to. This is especially important when we start using GPU preconditioners with CPR, as these GPU preconditioners typically involve expensive GPU allocations, matrix analyis and autotuning.
After this update the linear setup time should be reduced for all simulations, since this is the default preconditioner setup in OPM. Below is a comparison on SPE10 before and after this fix, which shows a reduction of 20 % for the linear setup time.