geodynamics / aspect

A parallel, extensible finite element code to simulate convection in both 2D and 3D models.
https://aspect.geodynamics.org/
Other
227 stars 237 forks source link

Convergence issue: 3D convection with an Earth-like initial condition (S40RTS) #4919

Open fadhli-atarita opened 2 years ago

fadhli-atarita commented 2 years ago

Hello,

I've been trying to run this cookbook for a while but I've always encountered convergence issues. I've been using the exact same parameters as in the S40RTS.prm and have tried to increase the refinements and maximum stokes solver steps and decrease the linear solver tolerance. However, these changes also produced convergence issues.

Here are the output, error, and solver history files for your reference

solver_history.txt error_5214159.log output_5214159.log

Thank you,

Fadhli

tjhei commented 2 years ago

Your ASPECT version seems to be quite old. Can you retry with the current development version, please? Is this the output without any changes to the .prm file?

fadhli-atarita commented 2 years ago

There was a segmentation fault issue on the clusters with v2.3.0 so the admin suggested that I use v2.2.0 for now. This output was produced with changes on the 3 parameters that I mentioned earlier. Without any changes it would still have similar issues but at a different timestep.

tjhei commented 2 years ago

@zjiaqi2018 Can you try cookbooks/initial-condition-S20RTS/S20RTS.prm with the current ASPECT master and run it for a while to check if it crashes for you as well?

zjiaqi2018 commented 2 years ago

Seems to be fine:


*** Timestep 121:  t=2.36743e+07 years, dt=155599 years
   Solving temperature system... 23 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 92+0 iterations.

   Postprocessing:
     RMS, max velocity:                  0.261 m/year, 1.15 m/year
     Writing heat flux map:              output-S20RTS/heat_flux.00121
     Heat fluxes through boundary parts: -1.219e+14 W, 7.079e+14 W
     Density at top/bottom of domain:    3359 kg/m^3, 3260 kg/m^3
     Pressure at top/bottom of domain:   8.67e-07 Pa, 9.576e+10 Pa
     Computing dynamic topography        
     Writing geoid anomaly:              output-S20RTS/geoid_anomaly.00121
     Writing graphical output:           output-S20RTS/solution/solution-00121
zjiaqi2018 commented 2 years ago

update: It does fail if it's long enough.


*** Timestep 1217:  t=1.19037e+08 years, dt=52.2907 years
   Solving temperature system... retrying linear solve with different preconditioner...
---------------------------------------------------------
TimerOutput objects finalize timed values printed to the
screen by communicating over MPI in their destructors.
Since an exception is currently uncaught, this
synchronization (and subsequent output) will be skipped
to avoid a possible deadlock.
---------------------------------------------------------

----------------------------------------------------
Exception 'ExcMessage (exception_message.str())' on rank 0 on processing: 

--------------------------------------------------------
An error occurred in line <2778> of file </home/jiaqi2/aspect/aspect/source/utilities.cc> in function
    void aspect::Utilities::linear_solver_failed(const string&, const string&, const std::vector<dealii::SolverControl>&, const std::exception&, const MPI_Comm&, const string&)
The violated condition was: 
    false
Additional information: 
    The iterative advection solver in Simulator::solve_advection did not
    converge.

    The initial residual was: 3.385886e+30
    The final residual is: 3.385886e+30
    The required residual for convergence is: 3.153455e+20
    See output-S20RTS/solver_history.txt for the full convergence history.

    The solver reported the following error:

    --------------------------------------------------------
    An error occurred in line <1075> of file
    </home/jiaqi2/dealii/installed/include/deal.II/lac/solver_gmres.h> in
    function
    void dealii::SolverGMRES<VectorType>::solve(const MatrixType&,
    VectorType&, const VectorType&, const PreconditionerType&) [with
    MatrixType = dealii::TrilinosWrappers::SparseMatrix;
    PreconditionerType = dealii::TrilinosWrappers::PreconditionILU;
    VectorType = dealii::TrilinosWrappers::MPI::Vector]
    The violated condition was:
    iteration_state == SolverControl::success
    Additional information:
    Iterative method reported convergence failure in step 1000. The
    residual in the last step was 3.38589e+30.

    This error message can indicate that you have simply not allowed a
    sufficiently large number of iterations for your iterative solver to
    converge. This often happens when you increase the size of your
    problem. In such cases, the last residual will likely still be very
    small, and you can make the error go away by increasing the allowed
    number of iterations when setting up the SolverControl object that
    determines the maximal number of iterations you allow.

    The other situation where this error may occur is when your matrix is
    not invertible (e.g., your matrix has a null-space), or if you try to
    apply the wrong solver to a matrix (e.g., using CG for a matrix that
    is not symmetric or not positive definite). In these cases, the
    residual in the last iteration is likely going to be large.

    Stacktrace:
    -----------
    #0  ./../../build/aspect: void
    dealii::SolverGMRES<dealii::TrilinosWrappers::MPI::Vector>::solve<dealii::TrilinosWrappers::SparseMatrix,
    dealii::TrilinosWrappers::PreconditionILU>(dealii::TrilinosWrappers::SparseMatrix
    const&, dealii::TrilinosWrappers::MPI::Vector&,
    dealii::TrilinosWrappers::MPI::Vector const&,
    dealii::TrilinosWrappers::PreconditionILU const&)
    #1  ./../../build/aspect:
    aspect::Simulator<3>::solve_advection(aspect::Simulator<3>::AdvectionField
    const&)
    #2  ./../../build/aspect:
    aspect::Simulator<3>::assemble_and_solve_temperature(bool, double*)
    #3  ./../../build/aspect:
    aspect::Simulator<3>::solve_single_advection_single_stokes()
    #4  ./../../build/aspect: aspect::Simulator<3>::solve_timestep()
    #5  ./../../build/aspect: aspect::Simulator<3>::run()
    #6  ./../../build/aspect: void
    run_simulator<3>(std::__cxx11::basic_string<char,
    std::char_traits<char>, std::allocator<char> > const&,
    std::__cxx11::basic_string<char, std::char_traits<char>,
    std::allocator<char> > const&, bool, bool, bool)
    #7  ./../../build/aspect: main
    --------------------------------------------------------

Stacktrace:
-----------
#0  ./../../build/aspect: aspect::Utilities::linear_solver_failed(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<dealii::SolverControl, std::allocator<dealii::SolverControl> > const&, std::exception const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#1  ./../../build/aspect: aspect::Simulator<3>::solve_advection(aspect::Simulator<3>::AdvectionField const&)
#2  ./../../build/aspect: aspect::Simulator<3>::assemble_and_solve_temperature(bool, double*)
#3  ./../../build/aspect: aspect::Simulator<3>::solve_single_advection_single_stokes()
#4  ./../../build/aspect: aspect::Simulator<3>::solve_timestep()
#5  ./../../build/aspect: aspect::Simulator<3>::run()
#6  ./../../build/aspect: void run_simulator<3>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool)
#7  ./../../build/aspect: main
--------------------------------------------------------

Aborting!
----------------------------------------------------``
tjhei commented 2 years ago

Can you share log from the last time step? We might need to look at the last graphical output as well.

zjiaqi2018 commented 2 years ago

*** Timestep 1216:  t=1.19037e+08 years, dt=282.348 years
   Solving temperature system... 48 iterations.
   Solving Stokes system... 53+0 iterations.

   Postprocessing:

     RMS, max velocity:                  36.3 m/year, 2.71e+03 m/year
     Writing heat flux map:              output-S20RTS/heat_flux.01216
     Heat fluxes through boundary parts: -1.342e+22 W, 1.859e+17 W
     Density at top/bottom of domain:    3359 kg/m^3, 3260 kg/m^3
     Pressure at top/bottom of domain:   2.138e-07 Pa, 1.029e+11 Pa
     Computing dynamic topography        
     Writing geoid anomaly:              output-S20RTS/geoid_anomaly.01216
     Writing graphical output:           output-S20RTS/solution/solution-01216

step-01216-P step-01216-T step-01216-V

zjiaqi2018 commented 2 years ago

a few more steps before crashing, see the change of iteration numbers for both temperature and Stokes


*** Timestep 1211:  t=1.19007e+08 years, dt=31307.7 years
   Solving temperature system... 25 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 200+15 iterations.

   Postprocessing:
     RMS, max velocity:                  0.763 m/year, 9.01 m/year
     Writing heat flux map:              output-S20RTS/heat_flux.01211
     Heat fluxes through boundary parts: -2.601e+17 W, 2.448e+14 W
     Density at top/bottom of domain:    3359 kg/m^3, 3260 kg/m^3
     Pressure at top/bottom of domain:   1.986e-06 Pa, 1.042e+11 Pa
     Computing dynamic topography        
     Writing geoid anomaly:              output-S20RTS/geoid_anomaly.01211
     Writing graphical output:           output-S20RTS/solution/solution-01211

*** Timestep 1212:  t=1.19023e+08 years, dt=16513.3 years
   Solving temperature system... 25 iterations.
   Solving Stokes system... 200+15 iterations.

   Postprocessing:
     RMS, max velocity:                  0.729 m/year, 13.7 m/year
     Writing heat flux map:              output-S20RTS/heat_flux.01212
     Heat fluxes through boundary parts: -2.784e+17 W, -1.715e+14 W
     Density at top/bottom of domain:    3359 kg/m^3, 3260 kg/m^3
     Pressure at top/bottom of domain:   -1.501e-06 Pa, 1.042e+11 Pa
     Computing dynamic topography        
     Writing geoid anomaly:              output-S20RTS/geoid_anomaly.01212
     Writing graphical output:           output-S20RTS/solution/solution-01212

*** Timestep 1213:  t=1.19032e+08 years, dt=8431.86 years
   Solving temperature system... 24 iterations.
   Solving Stokes system... 200+14 iterations.

   Postprocessing:
     RMS, max velocity:                  0.759 m/year, 29.9 m/year
     Writing heat flux map:              output-S20RTS/heat_flux.01213
     Heat fluxes through boundary parts: -2.961e+17 W, -3.893e+13 W
     Density at top/bottom of domain:    3359 kg/m^3, 3260 kg/m^3
     Pressure at top/bottom of domain:   3.231e-06 Pa, 1.042e+11 Pa
     Computing dynamic topography        
     Writing geoid anomaly:              output-S20RTS/geoid_anomaly.01213
     Writing graphical output:           output-S20RTS/solution/solution-01213

*** Timestep 1214:  t=1.19036e+08 years, dt=4260.65 years
   Solving temperature system... 27 iterations.
   Solving Stokes system... 150+0 iterations.

   Postprocessing:
     RMS, max velocity:                  4.92 m/year, 441 m/year
     Writing heat flux map:              output-S20RTS/heat_flux.01214
     Heat fluxes through boundary parts: 5.12e+19 W, -1.92e+13 W
     Density at top/bottom of domain:    3359 kg/m^3, 3260 kg/m^3
     Pressure at top/bottom of domain:   3.711e-07 Pa, 1.038e+11 Pa
     Computing dynamic topography        
     Writing geoid anomaly:              output-S20RTS/geoid_anomaly.01214
     Writing graphical output:           output-S20RTS/solution/solution-01214

*** Timestep 1215:  t=1.19036e+08 years, dt=355.013 years
   Solving temperature system... 33 iterations.
   Solving Stokes system... 146+0 iterations.

   Postprocessing:
     RMS, max velocity:                  7.29 m/year, 462 m/year
     Writing heat flux map:              output-S20RTS/heat_flux.01215
     Heat fluxes through boundary parts: -8.097e+18 W, 3.922e+16 W
     Density at top/bottom of domain:    3359 kg/m^3, 3260 kg/m^3
     Pressure at top/bottom of domain:   3.694e-06 Pa, 1.038e+11 Pa
     Computing dynamic topography        
     Writing geoid anomaly:              output-S20RTS/geoid_anomaly.01215
     Writing graphical output:           output-S20RTS/solution/solution-01215
tjhei commented 2 years ago

Thank you. Yes, it looks like the solution it blowing up.

tjhei commented 2 years ago

@zjiaqi2018 Can you try to add "Time steps between mesh refinement = 0" in the mesh refinement section? It seems that the mesh gets adapted/coarsened over time (is that true?). If that helps, please open a PR to add it to the prm.

zjiaqi2018 commented 2 years ago

It seems that the mesh gets adapted/coarsened over time (is that true?).

True, a few steps before diverging, the dofs is only Number of degrees of freedom: 80,741 (58,485+2,761+19,495) compared to the initial dofs: Number of degrees of freedom: 215,962 (156,774+6,930+52,258)

zjiaqi2018 commented 2 years ago

still diverges, but a later time:

*** Timestep 1585:  t=1.97014e+08 years, dt=126.669 years
   Solving temperature system... 96 iterations.
   Solving Stokes system... 28+0 iterations.

   Postprocessing:
     RMS, max velocity:                  355 m/year, 1.47e+04 m/year
     Writing heat flux map:              output-S20RTS/heat_flux.01585
     Heat fluxes through boundary parts: -3.484e+17 W, 6.25e+21 W
     Density at top/bottom of domain:    3359 kg/m^3, 3260 kg/m^3
     Pressure at top/bottom of domain:   -7.894e-07 Pa, 9.271e+10 Pa
     Computing dynamic topography        
     Writing geoid anomaly:              output-S20RTS/geoid_anomaly.01585

*** Timestep 1586:  t=1.97014e+08 years, dt=7.30192 years
   Solving temperature system... retrying linear solve with different preconditioner...
tjhei commented 2 years ago

@jdannberg Should we try one more refinement next?

zjiaqi2018 commented 2 years ago

one more refinement level seems to resolve the issue. Now dt is much larger, which means the solution is not blowing up, and the numbers of iterations are good. Is t=2e+08 years long enough?


*** Timestep 887:  t=2e+08 years, dt=172157 years
   Solving temperature system... 18 iterations.
   Solving Stokes system... 13+0 iterations.

   Postprocessing:
     RMS, max velocity:                  0.0672 m/year, 0.34 m/year
     Writing heat flux map:              output-S20RTS/heat_flux.00887
     Heat fluxes through boundary parts: -3.919e+13 W, 6.326e+13 W
     Density at top/bottom of domain:    3359 kg/m^3, 3260 kg/m^3
     Pressure at top/bottom of domain:   1.068e-09 Pa, 9.614e+10 Pa
     Computing dynamic topography
     Writing geoid anomaly:              output-S20RTS/geoid_anomaly.00887
     Writing graphical output:           output-S20RTS/solution/solution-00881

image image

jdannberg commented 2 years ago

I think increasing the resolution seems like a good solution for @fadhli-atarita's original issue. And it's great to see that it works! Since the cookbook itself is not intended to be time-dependent, I would only change the resolution there if it can still be run on a laptop in a reasonable time with the higher resolution.

tjhei commented 2 years ago

We would be at 1.6 million DoFs, right? That might be too much for this cookbook.

I would suggest we only make the former change.

zjiaqi2018 commented 2 years ago

We would be at 1.6 million DoFs, right?

Right. I will open a PR.

fadhli-atarita commented 2 years ago

Thank you all. I'll try this and let you know if I still encounter similar problems.

Also just to be sure, there is no version issue for this particular problem, right? v2.4.0 is currently being installed on the clusters that I use and we don't know how that will go. This means I still have to use v.2.20 for now.