Saturation adjustment fails with reasonable values

CliMA / ClimateMachine.jl

Climate Machine: an Earth System Model that automatically learns from data

https://clima.github.io/ClimateMachine.jl/latest/

Other

451 stars 78 forks source link

Saturation adjustment fails with reasonable values #1989

Closed szy21 closed 3 years ago

szy21 commented 3 years ago

Description

When running the moist baroclinic wave, SA fails even though the input values seem reasonable. For example:

Setting tolerance to 0.1:

maxiter reached in saturation_adjustment:
Method=NewtonsMethod, e_int=11594.663171, ρ=0.931742, q_tot=0.006174, T=272.908738, maxiter=3, tol=71.750602

Setting tolerance to 1:

maxiter reached in saturation_adjustment:
Method=NewtonsMethod, e_int=10011.473560, ρ=0.929731, q_tot=0.006852, T=272.130054, maxiter=3, tol=717.506023

tapios commented 3 years ago

This is near the freezing point, where Newton iterations have trouble because of the discontinuity. We should simply exit without error when maxiter is reached. We can also smooth the phase transition, and perhaps should do so one day.

But for now simply exiting at maxiter would be preferable. Ideally we would do so with a warning. We have discussed this earlier, but then it was difficult to write out warnings. If that is still the case, simply ignore the lack of convergence at maxiter. This issue should not continue to hold us up. If we cannot have warnings, a flag SA_debug that controls the error behavior would be a fine solution now, and it should default to no error at maxiter. My preference in any case is to be able to set maxiter = 1 and run with this without bailing out.

szy21 commented 3 years ago

We have hacked into SA a little bit and it seems at one point during the iteration the temperature goes to NaN. But I'm not sure, the output is hard to understand.

@charleskawczynski Could we have an option that allows SA to exit at max iteration (without warnings if it is hard)? I know it is not ideal, but we want to test what really breaks the moist runs for now and this would be very helpful. Thanks!

charleskawczynski commented 3 years ago

I spoke with @bischtob about this. The "quick" fix to improve robustness is to use this commit from this branch, which makes Regula Falsi method, which has a much more robust convergence region the default numerical method, and increases the maximum number of iterations to 10. These changes may make the simulation a bit slower, but SA should converge for all reasonable input values.

I'm hoping to merge #1885 and investigate the performance-robustness tradeoff of different numerical methods so that we can get the best of both worlds.

Furthermore, I would recommend to not modify error_on_non_convergence() for two reasons:

It does not suppress the print statements, which will also severely slow down the solution
There will be no guarantees on the resulting temperature, which I've seen to reach NaNs given certain combinations of inputs.

tapios commented 3 years ago

Why are we continuing to make this complicated, @charleskawczynski? Other models use 1 fixed Newton iteration (e.g., NCAR). As you can see from @szy21's values, temperatures are essentially converged, despite the energy discontinuity. I see no point in going to regula falsi and many iterations to deal with the phase transition.

Please make an option for no-fail exit at maxiter.

tapios commented 3 years ago

If you are concerned about NaNs, you can put a separate catch in there.

charleskawczynski commented 3 years ago

I agree with your suggestion, my recommendation was just for debugging. We can use the existing Thermodynamics.error_on_non_convergence() = false and Thermodynamics.print_warning() = false options. This will also remove the prints so that the simulation will not significantly slow down.

szy21 commented 3 years ago

I agree with your suggestion, my recommendation was just for debugging. We can use the existing Thermodynamics.error_on_non_convergence() = false and Thermodynamics.print_warning() = false options. This will also remove the prints so that the simulation will not significantly slow down.

Setting both error_on_non_convergence and print_warning to False works, thanks! Is this the final solution? If so I will close this issue.

charleskawczynski commented 3 years ago

Yes, I think we can close if that works.