Closed MFraters closed 2 weeks ago
What a subtle bug. Thanks for figuring this out! I can see some other places that would assume that the solution
vector actually contains the solution and not the update to the solution. I would suggest a more radical fix: Put the following at the end of do_one_defect_correction_Stokes_step()
:
// The rest of ASPECT assumes 'solution' contains the solution values, not the update
// to the solution. Make sure the outside never sees the solution update we computed.
solution.block(introspection.block_indices.pressure) = current_linearization_point.block(introspection.block_indices.pressure);
solution.block(introspection.block_indices.velocities) = current_linearization_point.block(introspection.block_indices.velocities);
This has several benefits:
solution
vector always contains exactly what it is supposed to - the solution - except for inside do_one_defect_correction_Stokes_step()
where the function is well aware of what it is using the vector for.Also this PR warrants a changelog entry.
Do you have an idea why this affects so many tests?
I agree that that is a better solution. I change the code, and I checked that it still solves the issue. I will have to look in a bit more detail at the test results. I didn't necessarily expect that. Although that might be the iteration between the composition not being update with the converged velocity and pressure. I do this this change makes it better. I will add a changelog entry after I looked in more detail at the tests.
Thanks for the review @tjhei, I addressed your comments. Will look tomorrow in more detail at the test results.
20 Tests and probably all minor differences.
Some of the test results changed quite a bit. A number of DC or Newton tests now need 50-100% more nonlinear iterations. This looks to me as if something else outside that function assumes the solution
vector contains the update and not the solution variables, but I didnt see anything while looking around. It seems to affect both GMG and AMG solver.
Yes, I agree. I have been looking into what is causing this. It seems that the convergence is completely destroyed when updating the solution vector every iteratation. I tried several ways of doing this, and they don't work. I am now testing whether backing up the "internal" solution at the end of do_one_defect_correction_Stokes_step, overwriting the solution with the current linearization point and then restoring the "internal" solution at the beginning of the next "do_one_defect_correction_Stokes_step" works, and this seems to restore the convergence. I am now retesting @naliboff setup, whether that still works, and (if that still works) will then investigate further if this dependency can be untangled somehow.
So I think the issue is that the solution is being used as an initial guess somewhere. The defect correction type solvers should normally, after the first nonlinear iteration, have an initial guess close to zero or zero, because you expect the update to be small. By setting the solution to the full solution, our initial guess was way off. My first approach of just storing the DC solution works, but it requires storing more data. This solution just sets it to zero. In my testing there is only a small difference in the screen-output, where it requires one or two less or more iterations in the nonlinear channel flow. I am now running Johns setup to see if it still works and if it changes anything related to performance.
Anyway, lets see what the tester thinks about this ;)
hmm, although it performs a lot better now then before, looking at the tester results the number of linear iterations are usually a lot higher. Although sometimes it requires a few less nonliner itertations than before, it seems like it usually takes a few more nonlinear iterations (sometimes 7 instead of 1). So it seems that assuming a zero is worse than assuming the previous update to be applied again to start the iteration. So maybe copying information into a temp variable and restoring it, might not be a bad idea. (We could maybe even scale that new vector with the previous reduction in nonlinear residual to get a better guess for the next update.)
So it seems that assuming a zero is worse than assuming the previous update to be applied again to start the iteration.
That is surprising because why would the next update be going in the same direction? Is this maybe related to the adiabatic pressure that we normally put into the initial guess?
So, nonlinear iterations are higher. What about linear iterations? Maybe you need to adjust tolerances.
So it seems that assuming a zero is worse than assuming the previous update to be applied again to start the iteration.
That is surprising because why would the next update be going in the same direction? Is this maybe related to the adiabatic pressure that we normally put into the initial guess?
Yes, I find it also a bit surprising. I see no reason why the previous update would be a good indicator for the next one, besides that the norm will probably be the same order of magnitude. But for some reason it seems to be. I am not sure I understand your second point. Are you saying that even if I set the pressure DC guess to zero, it will change it to the adiabatic pressure?
So, nonlinear iterations are higher. What about linear iterations? Maybe you need to adjust tolerances.
The linear iterations al all higher as far as I could see. We could adjust the tolerances, but then you are not really comparing like for like anymore...
Doing a diff between the tester diffs between one where I set the solution to 0 (https://github.com/geodynamics/aspect/actions/runs/11613448580?pr=6116) and the solution to the previous solution (https://github.com/geodynamics/aspect/actions/runs/11653429467?pr=6116) shows that there is no difference in the results. So unless I made a mistake when comparing, the difference is fully caused by setting the correct solution at the and of the nonlinear iteration. So I have gone back to setting the solution to zero.
I am not sure why the nonlinear channel flow is so affected, since there is no composition and no real temperature. Any ideas?
hmm, two tests completely failed...
I can't reproduce these issues locally with dealii 9.6.0. For me they just pass without issue. I could try dealii 9.5.1 later to see if that is doing something different.
After the meeting today, I did a few more test. If I put the solution = 0 statements after the assemble_stokes_system function, I get the same behavior as if I just keep the full solution around, while if I put it before, I get the same behavior as if I put it at the beginning. I think it is an issue that the DC and NS solvers require a solution to be an update, while some material models using the solution vectors expect it to be the full solution.
The strange thing is that the nonlinear channel flow uses a simple nonlinear material model (https://github.com/geodynamics/aspect/blob/main/benchmarks/newton_solver_benchmark_set/nonlinear_channel_flow/simple_nonlinear.cc), which is not dependent on the solution vector, and the benchmark has no composition and a uniform zero temperature. So I still find it surprising that setting the solution to the full solution at every nonlinear iteration is changes the convergence behavior so much in this case.
Do you mean that although the simple_nonlinear.cc Material Model you linked to above has a viscosity that depends on the strain rate, the capping by the min/max viscosity remove this dependence on the solution?
I think I found the main culprit for the confusing results, although finding the correct fix needs a bit more thought. Inside assemble_stokes_system
we call compute_pressure_scaling_factor
, which uses the solution
vector to compute the current viscosity, which is used for the pressure scaling factor. If solution
is set to 0 we get the wrong pressure scaling factor. However, simply changing the use to the current_linearization_point
while better is not a complete solution either. It fixes most calls of assemble_stokes_system
, but not the one from compute_initial_newton_residual
where we on purpose set the current linearization point velocity to 0.
But @MFraters I think the first step is to remove the additional lines you added to set the solution vector to 0. They are not needed, because the solution vector is actually not used as the first guess for the solver (the first guess is taken from linearized_stokes_initial_guess
search for that in solve_stokes()
and you will see it is already initialized to 0 for the Newton solver).
I checked the solve_stokes()
function in solver.cc (https://github.com/geodynamics/aspect/blob/4d576b284ee7ad72e629b83076f975235d3ddf6f/source/simulator/solver.cc#L978-L989) and I found the lines you were talking about, where it sets the guess to zero.
I have tested it your branch fix_unnecessary_pressure_normalization with and without setting the values to zero, and there is still a significant difference (the same as before). But the the reported Initial Newton Stokes residual = 1.85171e+12, v = 1.85171e+12, p = 0
is exactly the same for both and as before removing the pressure scaling. So I guess there must still be another place the solution is used.
superseded by #6135
In issue #6046, a problem is shown where there is a significant difference between the normal Stokes solvers on the one side and the Defect Correction (DC) and Newton Stokes solvers on the other side. We managed to constrain the issue to a difference in the non-initial strain field. Looking more closely at the DC code, I realized that maybe the compostion advection function (assemble_and_solve_composition) is using the solution instead of the current linearization point. This means that the solution needs to be updated every nonlinear iteration when using an iterated advection scheme.
For now I changed it to just always update the solution to the current linearization point, even though for the stokes and for non-iterated advection schemes it might not be needed. I think I prefer to error on the causious side this time, but I am happy to make special if statements that that would be preferred.
There are still some small differences between the Stokes and DC stokes, but I would attribute that to the fact that the normal stokes doesn't always converge.
@naliboff can you test is this solve the issue for you?