geodynamics / aspect

A parallel, extensible finite element code to simulate convection in both 2D and 3D models.
https://aspect.geodynamics.org/
Other
228 stars 237 forks source link

inconsistent constraints crash #3248

Closed tjhei closed 5 years ago

tjhei commented 5 years ago

I ran into this scary bug when testing Q3 discretization for the Stokes system for the hollow_sphere benchmark. To reproduce, put the attached .prm into benchmarks/hollow_sphere/ and run with 3 MPI processes.

inconsistent_hollow_sphere.prm.txt

gassmoeller commented 5 years ago

I can confirm this is an issue. I attach my screen output:

-----------------------------------------------------------------------------
-- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
--     . version 2.2.0-pre (fix_latent_heat_viscosity, f2f88270b)
--     . using deal.II 9.2.0-pre (master, 24fc4f42ed)
--     .       with 32 bit indices and vectorization level 2 (256 bits)
--     . using Trilinos 12.10.1
--     . using p4est 2.0.0
--     . running in DEBUG mode
--     . running with 3 MPI processes
-----------------------------------------------------------------------------

Loading shared library <./libhollow_sphere.so>

-----------------------------------------------------------------------------
-- For information on how to cite ASPECT, see:
--   https://aspect.geodynamics.org/citing.html?ver=2.2.0-pre&sha=f2f88270b&src=code
-----------------------------------------------------------------------------
Number of active cells: 768 (on 3 levels)
Number of degrees of freedom: 81,330 (67,470+6,930+6,930)

*** Timestep 0:  t=0 seconds
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 0+20 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 1: 1

   Rebuilding Stokes preconditioner...
   Solving Stokes system... 0+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 2: 8.87017e-07

   Postprocessing:
     RMS, max velocity:                     3.67 m/s, 16.6 m/s
     System matrix memory consumption:      101.72 MB
     Pressure at top/bottom of domain:      2.64e-13 Pa, 500 Pa
     Computing dynamic topography           
     Writing graphical output:              output/solution/solution-00000
     Errors u_L1, p_L1, u_L2, p_L2 topo_L2: 1.215723e+00, 6.541060e+00, 7.145619e-01, 4.057621e+00, 1.056476e-02

Number of active cells: 999 (on 4 levels)
Number of degrees of freedom: 114,930 (94,890+10,020+10,020)

*** Timestep 0:  t=0 seconds
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 0+22 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 1: 1

   Rebuilding Stokes preconditioner...
   Solving Stokes system... 0+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 2: 7.23772e-07

   Postprocessing:
     RMS, max velocity:                     3.67 m/s, 16.6 m/s
     System matrix memory consumption:      123.40 MB
     Pressure at top/bottom of domain:      -1.109e-11 Pa, 500 Pa
     Computing dynamic topography           
     Writing graphical output:              output/solution/solution-00001
     Errors u_L1, p_L1, u_L2, p_L2 topo_L2: 1.207741e+00, 6.464146e+00, 7.171475e-01, 4.010137e+00, 1.048245e-02

Number of active cells: 1,272 (on 4 levels)
Number of degrees of freedom: 148,456 (122,514+12,971+12,971)

--------------------------------------------------------
An error occurred in line <853> of file </home/rene/software/aspect/source/simulator/core.cc> in function
    void aspect::Simulator<dim>::compute_current_constraints() [with int dim = 3]
The violated condition was: 
    current_constraints.is_consistent_in_parallel( dof_handler.locally_owned_dofs_per_processor(), locally_active_dofs, mpi_communicator, false)
Additional information: 
    Inconsistent Constraints detected!

Stacktrace:
-----------
#0  aspect: aspect::Simulator<3>::compute_current_constraints()
#1  aspect: aspect::Simulator<3>::set_initial_temperature_and_compositional_fields()
#2  aspect: aspect::Simulator<3>::run()
#3  aspect: void run_simulator<3>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool)
#4  aspect: main
--------------------------------------------------------

Calling MPI_Abort now.
To break execution in a GDB session, execute 'break MPI_Abort' before running. You can also put the following into your ~/.gdbinit:
  set breakpoint pending on
  break MPI_Abort
  set breakpoint pending auto

--------------------------------------------------------
An error occurred in line <853> of file </home/rene/software/aspect/source/simulator/core.cc> in function
    void aspect::Simulator<dim>::compute_current_constraints() [with int dim = 3]
The violated condition was: 
    current_constraints.is_consistent_in_parallel( dof_handler.locally_owned_dofs_per_processor(), locally_active_dofs, mpi_communicator, false)
Additional information: 
    Inconsistent Constraints detected!

Stacktrace:
-----------
#0  aspect: aspect::Simulator<3>::compute_current_constraints()
#1  aspect: aspect::Simulator<3>::set_initial_temperature_and_compositional_fields()
#2  aspect: aspect::Simulator<3>::run()
#3  aspect: void run_simulator<3>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool)
#4  aspect: main
--------------------------------------------------------

Calling MPI_Abort now.
To break execution in a GDB session, execute 'break MPI_Abort' before running. You can also put the following into your ~/.gdbinit:
  set breakpoint pending on
  break MPI_Abort
  set breakpoint pending auto
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 255.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

--------------------------------------------------------
An error occurred in line <853> of file </home/rene/software/aspect/source/simulator/core.cc> in function
    void aspect::Simulator<dim>::compute_current_constraints() [with int dim = 3]
The violated condition was: 
    current_constraints.is_consistent_in_parallel( dof_handler.locally_owned_dofs_per_processor(), locally_active_dofs, mpi_communicator, false)
Additional information: 
    Inconsistent Constraints detected!

Stacktrace:
-----------
#0  aspect: aspect::Simulator<3>::compute_current_constraints()
#1  aspect: aspect::Simulator<3>::set_initial_temperature_and_compositional_fields()
#2  aspect: aspect::Simulator<3>::run()
#3  aspect: void run_simulator<3>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool)
#4  aspect: main
--------------------------------------------------------

Calling MPI_Abort now.
To break execution in a GDB session, execute 'break MPI_Abort' before running. You can also put the following into your ~/.gdbinit:
  set breakpoint pending on
  break MPI_Abort
  set breakpoint pending auto
[rene-laptop:11193] 2 more processes have sent help message help-mpi-api.txt / mpi-abort
[rene-laptop:11193] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
anne-glerum commented 5 years ago

I just ran into the same error with a production model that did not give problems before. For this model it seems the error is not hit when the Initial global refinement or the number of processes is lowered.

tjhei commented 5 years ago

I just ran into the same error with a production model that did not give problems before. For this model it seems the error is not hit when the Initial global refinement or the number of processes is lowered

What is your "Stokes velocity polynomial degree"? Do you have a .prm I can look at?

anne-glerum commented 5 years ago

What is your "Stokes velocity polynomial degree"?

That's default, so 2.

Do you have a .prm I can look at?

The plugins I use for initial temperature, initial composition, initial topography and the velocity boundary conditions are not in mainline. Do you want to just look at the prm or run it as well?

tjhei commented 5 years ago

Ideally, I would have a small and simple setup that I can run to debug it. If it is too complicated, I think I will look at my test problem, instead. Please keep your setup so you can check if a future PR helps.

tjhei commented 5 years ago

@anne-glerum: 2d, 3d, or both? What boundary conditions do you have?

anne-glerum commented 5 years ago

@tjhei The setup is 3D and requires additional data files to set up, definitely not a small and simple setup.

Velocities are prescribed on the sides, free slip bottom, free surface top. Chunk geometry. Composition and temperature fixed on all boundaries. Hope that helps.

I’ll keep the setup to test the future PR.

anne-glerum commented 5 years ago

Hey @tjhei , any updates on this problem? I can't run my models at the required resolution at the moment, do you suggest reverting to an older ASPECT? Any idea what commit?

tjhei commented 5 years ago

see my progress here: https://github.com/dealii/dealii/issues/8995

tjhei commented 5 years ago

I can't run my models at the required resolution at the moment, do you suggest reverting to an older ASPECT? Any idea what commit?

We need to disable this function call for now (it is a false alarm). I will prepare a PR.

gassmoeller commented 5 years ago

There is a workaround in place, and the bug itself is tracked in dealii/dealii#8995, so lets close this for now. Otherwise it looks like an open bug in ASPECT. #3282 added a TODO that reminds us of reverting the change once the problem in deal.II is fixed.