DLR-AMR / t8code

Parallel algorithms and data structures for tree-based adaptive mesh refinement (AMR) with arbitrary element shapes.
https://dlr-amr.github.io/t8code/
GNU General Public License v2.0
147 stars 52 forks source link

Bug: Empty process offset computation in t8_cmesh_gather_treecount_ext is wrong in some cases #1082

Open holke opened 5 months ago

holke commented 5 months ago

When constructing a cmesh. If t8_cmesh_load is used and the current process is empty, then the cmesh sets its num_local_tree to 0, its first_tree to -1 and first_tree_shared to 0.

This can break the offset computation for empty processes in t8_cmesh_gather_treecount_ext, because t8_offset_next_nonempty_rank may identify an empty rank as nonempty in the case that multiple empty are in a row.

Note, that this behaviour may also happen in potential new cmesh construction methods (that do not exist yet, but use a similar logic to set the empty processes).

Here is how we can fix the behaviour:

  1. Write a test case that covers this beahviour and currently fails (i.e. load (or simulatie a load) a cmesh such that multiple empty processes are in a row).
  2. Do not use an offset value to indicate empty processes in t8_cmesh_gather_treecount_ext. Instead use a second array of bool/int with values 0 and 1 for empty or non-empty processes and Allgather that array as well.
  3. In the empty correction starting from https://github.com/DLR-AMR/t8code/blob/e9a123b4109cb09be3fe00499b2622920a73396e/src/t8_cmesh/t8_cmesh_partition.cxx#L214 use the new array to determine the next non-empty process instead of the function t8_offset_next_nonempty_rank
  4. Make the execution of this empty correction optional controlled by a function parameter, which set true by cmesh_load but set false by other methods (i.e. uniform_bounds)
  5. Outsource the code starting from https://github.com/DLR-AMR/t8code/blob/e9a123b4109cb09be3fe00499b2622920a73396e/src/t8_cmesh/t8_cmesh_partition.cxx#L214 in an own function t8_cmesh_offset_empty_correction
holke commented 5 months ago

@Davknapp @lukasdreyer ping

lukasdreyer commented 4 months ago

@Davknapp: I think there is a similar problem, for t8_forest_partition_create_tree_offsets after calling t8_forest_partition_given, when you have 2 empty procs next to each other.

In t8_forest_partition_given, the following lines set first and last local tree in a way, that does not fit the offset rule needed to compute num_local_trees correctly at a different place. Therefore, offset is computed to be 0, and not the treeid of the next tree.

This bug can be seen by executing mpirun -n 8 ./test/t8_forest/t8_gtest_forest_commit --gtest_filter=*prism_cake*