Closed nairr closed 7 years ago
NaN residual for phase water
is also stopping the running with Model 2.2 with flow_ebos. Not clear how was that triggered yet.
hm. can you try to compile everything using -Og
and without -DNDEBUG
? the runtime should roughly double with this, but you will get the assertations and you can reasonably inspect it using gdb
.
(I'd do this myself, but so far I don't have that deck.)
I've just noticed that even before the breakdown your timesteps are very small (10^-8 days), probably this points to the problem.
Yes. Maybe @nairr can post the residual history a little bit before the crash.
I've created a gist of the simulation step prior to failure here: https://gist.github.com/nairr/eeb7952c0029e05d4c14f4f1129ba7e7
@andlaus - I will try building with -Og and get back to you
strange: I just successfully completed a full run of (my version of) Model 2.2. can you somehow give me access to your versions? (send me a private mail.)
That is a really great news. If you can share with with me the summary file and .DEBUG file, PRT file, it will be great. (kai.bao (AT) sintef.no)
I will find out if I can share with you the Model 2.2 file tomorrow.
From my side, although on a different deck, the nan residual for water phase
comes from nan
values of R_sum
. Not sure how this happens yet.
the nan residual for water phase
comes from too big BHP value from my running.
do you have any idea if this is caused by the well model or by the reservoir model? in both cases it is possible that the PVT extrapolation in opm-material goes belly-up...
The problem I am focusing on now is the THP related. Obviously, the THP control part is not completed yet. And we are investigating how we got so obviously wrong BHP value (5000 or 6000 Barsa or so) through VFPProd table (the estimate should be a few hundred bar by looking at the table). Probably we will find something tomorrow. Hopefully,
ok, cool. if I can help, let me know.
I get the same when I run realization 0 with the current master. Using tolerance_wells=1e-3 or 1e-5 helped. The reason for the NaNs is the small timesteps cased by convergence problems.
GitHub's auto-closing was incorrect in this case. Reopening.
Small timesteps means we switch to float, right? Might this have caused problems?
Small timesteps means we switch to float, right?
"Small" used to mean < 20 days for legacy flow. For flow_ebos I am not sure where the boundary is, @andlaus or @dr-robertk?
I don't really know, because to me the flow_ebos
linear solver is a black box. given that single precision floating point values exhibit a precision of about 7 decimal digits, it would make sense. (The NaNs are likely not the root of the problem, though.)
Yes. I agree, NaNs typically are not the root of problem. I am investigating a situation looks like some NaNs is generated during the linear solution.
outputting the wellSolutions before updateWellState
2.43964e+07
2.91488e+07
3.62471e+07
0.822454
1
1
0.09896
0
0
dx2_limited 0.142907
dx3_limited -0.127961
dx1_limited 216357
dx2_limited nan
dx3_limited nan
dx1_limited nan
dx2_limited nan
dx3_limited nan
dx1_limited nan
The reason for nan
looks like because resWell_
has nan
. So there is some other reason deeper.
Guys, flow_ebos uses the exact same linear solver as flow_legacy. So no more black box here. Also, single precision is disabled, because otherwise the matrix would have to be copied.
single precision is disabled
So that cannot be the reason for failures. Good!
So no black box here.
just wanted to say that I treat it as a black box because I do not understand that code, i.e. my statements about it should be taken with a pinch of salt ;)
It looks like combing PR #1083 and PR #1091 will fix the problem, not finish running yet, while already further. Only PR #1091 will not. Did not test with PR #1083 only.
please close the issue if it is fixed. (I somehow failed in completely reproducing it...)
Did you also try that realization (realization-9)? I reproduced the problem in the same time step while slight different symptom (No NaN invloved), probably due to some recent change.
Let us wait a little bit until the PR #1083 and PR #1091 get merged, then we can it is settled.
Did you also try that realization (realization-9)?
no, I haven't realized that it is that realization (sic). great that you fixed it, though.
Hi, @nairr , from my side, it looks like PR #1083 fixed this problem. Could you please help to verify it?
Hi, I still experience the same convergence issue with NaN residual for water phase with PR #1083
That is weird. Did you update all the modules to the latest version of master?
Basically, I reproduced your problem in a slightly different way with the latest master branch and I applied PR #1083, and it could run through.
I did update all the modules to the latest master branch. However I did not apply PR #1091
Okay. I will try again. In my previous experience, it is PR #1083 that affects this issue.
Hi, @nairr , I tested again, with PR #1083 and master branches, the issue is fixed.
Time step 115 at day 1827/5997, date = 01-Jan-2005
Substep 0, stepsize 31 days.
Error: [/home/kaib/OPM-master-test/debug/opm-simulators/opm/autodiff/NonlinearSolver_impl.hpp:154] Failed to complete a time step within 15 iterations.
Problem: Solver convergence failed, restarting solver with new time step (10.230000 days).
Substep 0, stepsize 10.23 days.
Substep summary: well iterations = 4, newton iterations = 7, linearizations = 8 (8.4966 sec), linear iterations = 187 (11.2829 sec)
Substep 1, stepsize 20.77 days.
Error: [/home/kaib/OPM-master-test/debug/opm-simulators/opm/autodiff/NonlinearSolver_impl.hpp:154] Failed to complete a time step within 15 iterations.
Problem: Solver convergence failed, restarting solver with new time step (6.854100 days).
Substep 1, stepsize 6.8541 days.
Substep summary: well iterations = 5, newton iterations = 12, linearizations = 14 (14.9858 sec), linear iterations = 314 (18.9499 sec)
Substep 2, stepsize 13.9159 days.
Error: [/home/kaib/OPM-master-test/debug/opm-simulators/opm/autodiff/NonlinearSolver_impl.hpp:154] Failed to complete a time step within 15 iterations.
Problem: Solver convergence failed, restarting solver with new time step (4.592247 days).
Substep 2, stepsize 4.59225 days.
Substep summary: well iterations = 8, newton iterations = 16, linearizations = 19 (20.2524 sec), linear iterations = 411 (24.553 sec)
Substep 3, stepsize 9.32365 days.
Substep summary: well iterations = 9, newton iterations = 28, linearizations = 32 (33.1254 sec), linear iterations = 591 (35.4222 sec)
Time step 116 at day 1858/5997, date = 01-Feb-2005
Substep 0, stepsize 20 days.
Substep summary: well iterations = 3, newton iterations = 8, linearizations = 9 (8.92022 sec), linear iterations = 224 (13.2843 sec)
Time step 117 at day 1878/5997, date = 21-Feb-2005
Substep 0, stepsize 6 days.
Substep summary: well iterations = 2, newton iterations = 5, linearizations = 6 (6.01052 sec), linear iterations = 94 (5.84578 sec)
My bad, the issue is indeed fixed.
Looks like the problem is back again, not sure which change caused the problem. Somehow the CNV for oil phase really like the number 1.594e-01
90372 Substep 12, stepsize 4.46778e-05 days.
90373 Iter W-FLUX(water) W-FLUX(oil) W-FLUX(gas)
90374 0 1.256e-05 1.736e-06 6.123e-07
90375 Iter MB(W) MB(O) MB(G) CNV(W) CNV(O) CNV(G) W-FLUX(W) W-FLUX(O) W-FLUX(G)
90376 0 4.889e-10 8.672e-10 1.215e-10 1.268e-06 3.941e-07 1.590e-06 1.256e-05 1.736e-06 6.123e-07
90377 1 1.686e-13 1.640e-06 1.204e-06 6.008e-10 1.594e-01 1.170e-01 6.508e-09 2.125e-08 3.804e-11
90378 2 2.976e-07 1.364e-06 4.094e-07 6.393e-02 1.594e-01 3.530e-02 2.936e-13 9.558e-13 6.675e-06
90379 3 8.955e-10 1.517e-06 1.330e-07 8.336e-04 1.594e-01 9.226e-03 7.185e-18 4.880e-16 6.099e-07
90380 4 9.052e-11 1.469e-06 5.055e-09 1.341e-05 1.594e-01 4.963e-04 1.437e-17 5.830e-16 2.915e-07
90381 5 3.625e-11 1.463e-06 1.687e-11 3.984e-06 1.594e-01 7.007e-06 1.796e-17 5.599e-16 4.647e-09
90382 6 4.051e-11 1.463e-06 2.286e-12 4.006e-06 1.594e-01 6.990e-06 2.155e-17 3.544e-16 1.409e-12
90383 7 4.083e-11 1.463e-06 2.707e-12 4.007e-06 1.594e-01 6.990e-06 2.066e-17 4.880e-16 5.906e-15
90384 8 4.086e-11 1.463e-06 2.735e-12 4.007e-06 1.594e-01 6.990e-06 2.697e-17 4.469e-16 3.573e-15
90385 9 4.086e-11 1.463e-06 2.737e-12 4.007e-06 1.594e-01 6.990e-06 1.841e-17 1.464e-16 1.739e-15
90386 10 4.086e-11 1.463e-06 2.737e-12 4.007e-06 1.594e-01 6.990e-06 2.697e-17 5.316e-16 6.855e-16
90387 11 4.086e-11 1.463e-06 2.737e-12 4.007e-06 1.594e-01 6.990e-06 1.078e-17 5.419e-16 1.050e-15
90388 12 4.086e-11 1.463e-06 2.737e-12 4.007e-06 1.594e-01 6.990e-06 2.155e-17 3.030e-16 6.855e-16
90389 13 4.086e-11 1.463e-06 2.737e-12 4.007e-06 1.594e-01 6.990e-06 2.155e-17 4.263e-16 1.369e-15
90390 14 4.086e-11 1.463e-06 2.737e-12 4.007e-06 1.594e-01 6.990e-06 2.697e-17 3.082e-16 2.195e-15
90391 15 4.086e-11 1.463e-06 2.737e-12 4.007e-06 1.594e-01 6.990e-06 1.437e-17 4.058e-16 1.541e-15
90392 [/home/kaib/OPM-master-test/debug/opm-simulators/opm/autodiff/NonlinearSolver_impl.hpp:154] Failed to complete a time step within 15 iterations.
90393 Caught Exception: [/home/kaib/OPM-master-test/debug/opm-simulators/opm/autodiff/NonlinearSolver_impl.hpp:154] Failed to complete a time step within 15 iterations.
90394 Solver convergence failed, restarting solver with new time step (0.000015 days).
Looks like the problem is back again,
Does reverting OPM/opm-core#1147 change these results?
Thanks for the suggestion. I was testing OPM/opm-parser#1051 . Will also test OPM/opm-core#1147 soon later.
It is just trying to get some clues for future reference. I do not we spend efforts on this. The simulator still shows some random/unpredictable behavior from time to time. For example, OPM/opm-simulators#1112 fixes this problem. There are many more changes related to the simulator that makes some convergence problems appear/disappear from time to time.
It looks like reverting either OPM/opm-parser#1051 or OPM/opm-core#1147 will fix the running of model 2, realization 9.
@totto82 are the PRs OPM/opm-parser#1051 and OPM/opm-core#1147 related or not. Do these two PRs interact in some way?
No. They should not interact directly. The first one is necessary for the simulator to apply the scaled capillary pressures due to SWATINIT. The second fixes initial rs and rv values. I am not surprised that the first one effects the simulator, but the last one should only have minor impact on the simulator.
For one of the realizations of model2 (realization-9), the solver fails to converge at time = 1827 days due to NaN residual for the water phase, when using the current build of flow_ebos. A version of flow_ebos compiled on 29/01/'17 could run the model successfully.