Always update the solution in the DC and NS solvers after every nonlinear iteration.

MFraters commented 3 weeks ago

In issue #6046, a problem is shown where there is a significant difference between the normal Stokes solvers on the one side and the Defect Correction (DC) and Newton Stokes solvers on the other side. We managed to constrain the issue to a difference in the non-initial strain field. Looking more closely at the DC code, I realized that maybe the compostion advection function (assemble_and_solve_composition) is using the solution instead of the current linearization point. This means that the solution needs to be updated every nonlinear iteration when using an iterated advection scheme.

For now I changed it to just always update the solution to the current linearization point, even though for the stokes and for non-iterated advection schemes it might not be needed. I think I prefer to error on the causious side this time, but I am happy to make special if statements that that would be preferred.

There are still some small differences between the Stokes and DC stokes, but I would attribute that to the fact that the normal stokes doesn't always converge.

@naliboff can you test is this solve the issue for you?

gassmoeller commented 3 weeks ago

What a subtle bug. Thanks for figuring this out! I can see some other places that would assume that the solution vector actually contains the solution and not the update to the solution. I would suggest a more radical fix: Put the following at the end of do_one_defect_correction_Stokes_step():

// The rest of ASPECT assumes 'solution' contains the solution values, not the update
// to the solution. Make sure the outside never sees the solution update we computed.
            solution.block(introspection.block_indices.pressure) = current_linearization_point.block(introspection.block_indices.pressure);
            solution.block(introspection.block_indices.velocities) = current_linearization_point.block(introspection.block_indices.velocities);

This has several benefits:

You do not need to search for all different solver schemes that use the defect correction or Newton solver and fix them individually (there are several in solver_schemes.cc which were not fixed in this PR)
You can remove the line you moved in your original commit and all the other similar lines for other solver schemes. Now the solution vector always contains exactly what it is supposed to - the solution - except for inside do_one_defect_correction_Stokes_step() where the function is well aware of what it is using the vector for.

Also this PR warrants a changelog entry.

Do you have an idea why this affects so many tests?

MFraters commented 3 weeks ago

I agree that that is a better solution. I change the code, and I checked that it still solves the issue. I will have to look in a bit more detail at the test results. I didn't necessarily expect that. Although that might be the iteration between the composition not being update with the converged velocity and pressure. I do this this change makes it better. I will add a changelog entry after I looked in more detail at the tests.

MFraters commented 3 weeks ago

Thanks for the review @tjhei, I addressed your comments. Will look tomorrow in more detail at the test results.

tjhei commented 3 weeks ago

20 Tests and probably all minor differences.

gassmoeller commented 3 weeks ago

Some of the test results changed quite a bit. A number of DC or Newton tests now need 50-100% more nonlinear iterations. This looks to me as if something else outside that function assumes the solution vector contains the update and not the solution variables, but I didnt see anything while looking around. It seems to affect both GMG and AMG solver.

MFraters commented 3 weeks ago

Yes, I agree. I have been looking into what is causing this. It seems that the convergence is completely destroyed when updating the solution vector every iteratation. I tried several ways of doing this, and they don't work. I am now testing whether backing up the "internal" solution at the end of do_one_defect_correction_Stokes_step, overwriting the solution with the current linearization point and then restoring the "internal" solution at the beginning of the next "do_one_defect_correction_Stokes_step" works, and this seems to restore the convergence. I am now retesting @naliboff setup, whether that still works, and (if that still works) will then investigate further if this dependency can be untangled somehow.

MFraters commented 3 weeks ago

So I think the issue is that the solution is being used as an initial guess somewhere. The defect correction type solvers should normally, after the first nonlinear iteration, have an initial guess close to zero or zero, because you expect the update to be small. By setting the solution to the full solution, our initial guess was way off. My first approach of just storing the DC solution works, but it requires storing more data. This solution just sets it to zero. In my testing there is only a small difference in the screen-output, where it requires one or two less or more iterations in the nonlinear channel flow. I am now running Johns setup to see if it still works and if it changes anything related to performance.

Anyway, lets see what the tester thinks about this ;)

MFraters commented 3 weeks ago

hmm, although it performs a lot better now then before, looking at the tester results the number of linear iterations are usually a lot higher. Although sometimes it requires a few less nonliner itertations than before, it seems like it usually takes a few more nonlinear iterations (sometimes 7 instead of 1). So it seems that assuming a zero is worse than assuming the previous update to be applied again to start the iteration. So maybe copying information into a temp variable and restoring it, might not be a bad idea. (We could maybe even scale that new vector with the previous reduction in nonlinear residual to get a better guess for the next update.)

tjhei commented 3 weeks ago

So it seems that assuming a zero is worse than assuming the previous update to be applied again to start the iteration.

That is surprising because why would the next update be going in the same direction? Is this maybe related to the adiabatic pressure that we normally put into the initial guess?

So, nonlinear iterations are higher. What about linear iterations? Maybe you need to adjust tolerances.

MFraters commented 3 weeks ago

So it seems that assuming a zero is worse than assuming the previous update to be applied again to start the iteration.

That is surprising because why would the next update be going in the same direction? Is this maybe related to the adiabatic pressure that we normally put into the initial guess?

Yes, I find it also a bit surprising. I see no reason why the previous update would be a good indicator for the next one, besides that the norm will probably be the same order of magnitude. But for some reason it seems to be. I am not sure I understand your second point. Are you saying that even if I set the pressure DC guess to zero, it will change it to the adiabatic pressure?

So, nonlinear iterations are higher. What about linear iterations? Maybe you need to adjust tolerances.

The linear iterations al all higher as far as I could see. We could adjust the tolerances, but then you are not really comparing like for like anymore...

MFraters commented 2 weeks ago

Doing a diff between the tester diffs between one where I set the solution to 0 (https://github.com/geodynamics/aspect/actions/runs/11613448580?pr=6116) and the solution to the previous solution (https://github.com/geodynamics/aspect/actions/runs/11653429467?pr=6116) shows that there is no difference in the results. So unless I made a mistake when comparing, the difference is fully caused by setting the correct solution at the and of the nonlinear iteration. So I have gone back to setting the solution to zero.

I am not sure why the nonlinear channel flow is so affected, since there is no composition and no real temperature. Any ideas?

MFraters commented 2 weeks ago

hmm, two tests completely failed...

MFraters commented 2 weeks ago

I can't reproduce these issues locally with dealii 9.6.0. For me they just pass without issue. I could try dealii 9.5.1 later to see if that is doing something different.

MFraters commented 2 weeks ago

After the meeting today, I did a few more test. If I put the solution = 0 statements after the assemble_stokes_system function, I get the same behavior as if I just keep the full solution around, while if I put it before, I get the same behavior as if I put it at the beginning. I think it is an issue that the DC and NS solvers require a solution to be an update, while some material models using the solution vectors expect it to be the full solution.

The strange thing is that the nonlinear channel flow uses a simple nonlinear material model (https://github.com/geodynamics/aspect/blob/main/benchmarks/newton_solver_benchmark_set/nonlinear_channel_flow/simple_nonlinear.cc), which is not dependent on the solution vector, and the benchmark has no composition and a uniform zero temperature. So I still find it surprising that setting the solution to the full solution at every nonlinear iteration is changes the convergence behavior so much in this case.

anne-glerum commented 2 weeks ago

Do you mean that although the simple_nonlinear.cc Material Model you linked to above has a viscosity that depends on the strain rate, the capping by the min/max viscosity remove this dependence on the solution?

gassmoeller commented 2 weeks ago

I think I found the main culprit for the confusing results, although finding the correct fix needs a bit more thought. Inside assemble_stokes_system we call compute_pressure_scaling_factor, which uses the solution vector to compute the current viscosity, which is used for the pressure scaling factor. If solution is set to 0 we get the wrong pressure scaling factor. However, simply changing the use to the current_linearization_point while better is not a complete solution either. It fixes most calls of assemble_stokes_system, but not the one from compute_initial_newton_residual where we on purpose set the current linearization point velocity to 0.

But @MFraters I think the first step is to remove the additional lines you added to set the solution vector to 0. They are not needed, because the solution vector is actually not used as the first guess for the solver (the first guess is taken from linearized_stokes_initial_guess search for that in solve_stokes() and you will see it is already initialized to 0 for the Newton solver).

MFraters commented 2 weeks ago

I checked the solve_stokes() function in solver.cc (https://github.com/geodynamics/aspect/blob/4d576b284ee7ad72e629b83076f975235d3ddf6f/source/simulator/solver.cc#L978-L989) and I found the lines you were talking about, where it sets the guess to zero.

I have tested it your branch fix_unnecessary_pressure_normalization with and without setting the values to zero, and there is still a significant difference (the same as before). But the the reported Initial Newton Stokes residual = 1.85171e+12, v = 1.85171e+12, p = 0 is exactly the same for both and as before removing the pressure scaling. So I guess there must still be another place the solution is used.

MFraters commented 2 weeks ago

superseded by #6135

geodynamics / aspect

Always update the solution in the DC and NS solvers after every nonlinear iteration. #6116