geodynamics / aspect

A parallel, extensible finite element code to simulate convection in both 2D and 3D models.
https://aspect.geodynamics.org/
Other
218 stars 233 forks source link

Initial global refinement and maximum/minimum refinement function varies with number of cores #3262

Open Djneu opened 4 years ago

Djneu commented 4 years ago

I ran into the following error while trying to run periodic boundary conditions with a maximum/minimum refinement function.

An error occurred in line <405> of file </home/bbpdneu1/software/deal_libraries/tmp/unpack/deal.II-v9.1.1/include/deal.II/lac/affine_constraints.templates.h> in function
    void dealii::AffineConstraints<number>::close() [with number = double]
The violated condition was: 
    dof_index != line.index
Additional information: 
    Cycle in constraints detected!

Stacktrace:
-----------
#0  /home/bbpdneu1/software/deal_libraries/deal.II-v9.1.1/lib/libdeal_II.g.so.9.1.1: dealii::AffineConstraints<double>::close()
#1  /home/bbpdneu1/software/builds/debug_fast/aspect: aspect::Simulator<3>::setup_dofs()
#2  /home/bbpdneu1/software/builds/debug_fast/aspect: aspect::Simulator<3>::run()
#3  /home/bbpdneu1/software/builds/debug_fast/aspect: void run_simulator<3>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool)
#4  /home/bbpdneu1/software/builds/debug_fast/aspect: main

@anne-glerum and I looked into it briefly and found that the maximum and minimum refinement function works on the initial global refinement even without any adaptive refinements. We also noticed that, depending on the number of cores, this can lead to variable meshes on each side that don't always fit the given constraints. We ran a simple model with all tangential boundaries using ASPECT 2.2.0-pre (a856e796c) and deal.II 9.1.1 with the following mesh constraints at 1, 2, 4, and 8 mpi processes.

subsection Mesh refinement
  set Initial global refinement                = 3
  set Initial adaptive refinement              = 0
  set Time steps between mesh refinement       = 0
  set Refinement fraction = 0.5
  set Strategy                                 =  minimum refinement function, maximum refinement function

  subsection Maximum refinement function
      set Coordinate system   = cartesian
      set Variable names      = x,y,z
      set Function expression = if(z>=150e3, if(z>=160e3, if(z>=170e3, 3, 2), 1), 0)
  end

   subsection Minimum refinement function
      set Coordinate system   = cartesian
      set Variable names      = x,y,z
      set Function expression = if(z>=150e3, if(z>=160e3, if(z>=170e3, 3, 2), 1), 0)
  end
end

cores_vs_meshrefinement The attached image, colored by partition, shows what the mesh ended up looking like on each side for these cases. It seems that with 1 core, the mesh matches what we prescribed in the prm file, but at more cores there starts to appear areas above the maximum. Any ideas where this issue may be coming from? Thank you in advance!

global_mesh_refinement.prm.txt

gassmoeller commented 4 years ago

Hmm, weird. Can you describe how you got the error message at the beginning? If I run your parameter file I do not see a crash even with X or Y periodic boundaries (using ASPECT 2.1.0-pre: 133ac9d585de19d3b7dace32a7a2bdea57b2ee49, and deal.II 9.2.0-pre: 24fc4f42ed). Isnt it in general fine to have different cell levels across periodic boundaries as long as the levels of adjacent cells differ by at most 1? This should be exactly like having a hanging node inside the mesh somewhere.

Djneu commented 4 years ago

I've been running into the error using that prm file with periodic y boundaries, although only at 5+ cores. I switched to the ASPECT version you tested it on, and then I don't get the error, even using up to 40 cores. With that version I still get the non-symmetric mesh on each side, so maybe the issue is something else?

anne-glerum commented 4 years ago

Isnt it in general fine to have different cell levels across periodic boundaries as long as the levels of adjacent cells differ by at most 1? This should be exactly like having a hanging node inside the mesh somewhere.

According to the dealii manual, even more than 1 level difference is fine: Otherwise, if face_1 and face_2 are not active faces, this function loops recursively over the children of face_1 and face_2. If only one of the two faces is active, then we recursively iterate over the children of the non-active ones and make sure that the solution function on the refined side equals that on the non-refined face in much the same way as we enforce hanging node constraints at places where differently refined cells come together. (However, unlike hanging nodes, we do not enforce the requirement that there be only a difference of one refinement level between the two sides of the domain you would like to be periodic).

For our build there seemed to be a correlation however between the unexpected refinement of the grid and the periodic boundary conditions failing on cyclicity of the constraints. Seeing that with a different ASPECT version this Assert is not hit, but the mesh is still not symmetric, these are probably different issues. So:

  1. Is it OK that a different domain composition leads to a different mesh upon Initial global refinement with the minimum and/or maximum refinement function?
  2. Why does a more recent ASPECT (but older dealII) crash for periodic boundary conditions and a larger number of cores?
Djneu commented 4 years ago

@gassmoeller any news?

gassmoeller commented 4 years ago

No I did not get back to this issue. How much of an issue is it for you at the moment? Could you update to a newer ASPECT version and just use that one? To Anne's questions:

Is it OK that a different domain composition leads to a different mesh upon Initial global refinement with the minimum and/or maximum refinement function?

As far as I understand the mesh refinement is not completely deterministic if the mesh is distributed differently, however the minimum and maximum refinement functions should always be correctly considered.

Why does a more recent ASPECT (but older dealII) crash for periodic boundary conditions and a larger number of cores?

I do not know. Does a current ASPECT and current deal.II work? In that case maybe it was a bug that was fixed as part of another fix? If you want to look more into it you could try make a little table of working / non-working combinations to figure out when the bug appeared, and if it is still there. This feels a bit like a complicated process though, so if recent version work for you that might be the easiest path forward.

Djneu commented 4 years ago

The periodic boundaries are still working fine for ASPECT 2.1, so the issue isn't too pressing at the moment. I tried making a table as you suggested with two dealii versions (9.1.1 and 9.2.0-pre), and the attached updated prm file which seems to always hit the issue using 10 cores. From this I've found that, regardless of the dealii version, the error is related to the reordering of DoFTools::make_hanging_node_constraints and DoFTools::make_periodicity_constraints in #2780, any versions after this change hit the cyclic constraints error. Reverting this change in the most recent ASPECT fixes the problem for the small test case I've ran so far. Could this possibly still be related to the variability in meshes on each side with increasing number of cores, and the issue wasn't picked up with the old ordering?

test_periodic.prm.txt

gassmoeller commented 4 years ago

Oh, so you are saying #2780 introduced this problem?

@tjhei: Could you take a look at this, you now more about the DoF reordering process.

anne-glerum commented 4 years ago

@tjhei any insight?

tjhei commented 4 years ago

@tjhei any insight?

sorry, Anne. I haven't had time to look into it. It is on my list!

gassmoeller commented 4 years ago

@djneu: Please check if #3595 solved the problem and close this issue if it did.

Djneu commented 4 years ago

From a quick test with the prm file included in this discussion and a larger model I've had the same problem with, the issue seems to be resolved. Thank you!

anne-glerum commented 2 years ago

Reopened, because we ran into this problem again with a different setup, as discussed in the user meeting. The error goes away when we undo #2780 .

tjhei commented 2 years ago

Is there a simple setup you can share that shows the issue? Please report back if you see other issues when switching the order.

bangerth commented 1 year ago

@anne-glerum Do you happen to know whether this is still an issue? It's been two years, perhaps it was fixed somewhere along the way?