cburstedde / p4est

The "p4est" forest-of-octrees library
www.p4est.org/
GNU General Public License v2.0
251 stars 114 forks source link

Possible corner balance bug #59

Closed data-panda closed 5 years ago

data-panda commented 5 years ago

I an writing a finite volume application over p4est where during my simulation(after few thousand timesteps) I am encountering a segmentation fault. The seg fault arises because of accessing the mesh structure which is encoding the neighborhood as -1 and -25 (qtq and qtf respectively). Looking further at it, it is occuring for a specific configuration of cell as shown in the figure below.

image

So the seg fault occurs while accessing the neighbor across the highlited face for the problem quadrant.

Before the problem occurs I make sure that I refine, coarse and balance using this sequence

p4est_mesh_destroy(mesh); // since the morphology is going to change now
p4est_refine_ext (p4est, 0, 9, refine_condition_voft, NULL,replace_quads);
p4est_coarsen_ext(p4est, 0, 0, coarsen_condition_voft, NULL,replace_quads);
p4est_balance_ext(p4est, P4EST_CONNECT_CORNER, NULL, replace_quads);
mesh = p4est_mesh_new_ext(p4est, ghost, 0, 1, P4EST_CONNECT_FACE); // since the forest has changed now
// just after this the function where the problem occurs

To check this i went in the gdb debug mode and did this and manually called the following

p4est_mesh_destroy(mesh);
p4est_balance_ext(p4est, P4EST_CONNECT_CORNER, NULL, replace_quads);

and then the forest looks like this (problem quadrant is outlined in pink)

image

I also dumped the forest and re opened it in a simple code and just run the balance api, and it does balances.

Could you possibly point as to why the balance algorithm is not able to refine the neighbor across the tree boundary in one pass so that the 2:1 balance condition is met. Do i need to check if the forest is balanced after the balance operation ?

data-panda commented 5 years ago

Do I need to call

ghost = p4est_ghost_new (p4est, P4EST_CONNECT_FULL);

after the refine, coarsen and balance operation ? but even when I manually did the balancing operation using gdb, I did not recreate the ghost and the forest did balance itself.

I am not recreating ghost at every timestep since I am running the application currently on a single node.

data-panda commented 5 years ago

Just few pointers on the refine, coarsen and balance stats

Into p4est_refine with 56946 total quadrants, allowed level 9
Done p4est_refine with 63825 total quadrants
Into p4est_coarsen with 63825 total quadrants
Done p4est_coarsen with 54594 total quadrants
Into p4est_balance CORNER with 54594 total quadrants
Done p4est_balance with 56949 total quadrants

Thats the buggy forest. The stats below when I manually call p4est_balance_ext from gdb

Into p4est_balance CORNER with 56949 total quadrants
Done p4est_balance with 56952 total quadrants

That's the correct forest.

cburstedde commented 5 years ago

p4est_mesh_destroy(mesh); // since the morphology is going to change now

It is recommended to call ghost_destroy here, since ghost will be invalidated by refine/coarsen/balance.

p4est_refine_ext (p4est, 0, 9, refine_condition_voft, NULL,replace_quads); p4est_coarsen_ext(p4est, 0, 0, coarsen_condition_voft, NULL,replace_quads); p4est_balance_ext(p4est, P4EST_CONNECT_CORNER, NULL, replace_quads);

Here you definitely need to call ghost_new. Both for serial and parallel. It is ok to call it with CONNECT_FACE if that's all you need in your mesh. Still, if you could do us a favor and check CONNECT_FULL as well, for both ghost and mesh.

mesh = p4est_mesh_new_ext(p4est, ghost, 0, 1, P4EST_CONNECT_FACE); // since the forest has changed now // just after this the function where the problem occurs

Please let us know if there is still trouble after doing this.

Thanks a lot, Carsten

data-panda commented 5 years ago

Hi Carsten,

So I tried a few things and here are the observations -

  1. Use of p4est_is_balanced
p4est_mesh_destroy(mesh);

p4est_refine_ext (p4est, 0, MaxL, refine_condition_voft, NULL,replace_quads);
p4est_coarsen_ext(p4est, 0, 0, coarsen_condition_voft, NULL,replace_quads);
p4est_balance_ext(p4est, P4EST_CONNECT_CORNER, NULL, replace_quads);

int check = 999; 
check = p4est_is_balanced(p4est, P4EST_CONNECT_CORNER);
if(check) p4est_balance_ext(p4est, P4EST_CONNECT_CORNER, NULL, replace_quads);
// do one more balance cycle if the forest is not balanced

mesh = p4est_mesh_new_ext(p4est, ghost, 0, 1, P4EST_CONNECT_FACE);
// PROBLEM OCCURS AFTER THIS WHEN I ACCESS THE MESH

This also ended up with the seg fault although I don't have a picture to show the state of the forest when it happened. But, since the seg fault happened in the same function call, i am sure that the forest wasn't balanced.

  1. Destroy the ghost and recreate th ghost
    
    p4est_mesh_destroy(mesh); // since the morphology is going to change now
    p4est_ghost_destroy (ghost);

p4est_refine_ext (p4est, 0, MaxL, refine_condition_voft, NULL,replace_quads); p4est_coarsen_ext(p4est, 0, 0, coarsen_condition_voft, NULL,replace_quads); p4est_balance_ext(p4est, P4EST_CONNECT_CORNER, NULL, replace_quads); ghost = p4est_ghost_new(p4est, P4EST_CONNECT_FULL); ghost_data = P4EST_ALLOC(data_t, ghost->ghosts.elem_count); p4est_ghost_exchange_data(p4est, ghost, ghost_data);

mesh = p4est_mesh_new_ext(p4est, ghost, 0, 1, P4EST_CONNECT_FACE);


Still the segfault occurs at somehere around the same time steps as it used to occur. This time, I have the state of the forest with me as I was running it in gdb

![image](https://user-images.githubusercontent.com/9045578/53910344-e500aa00-4053-11e9-82d9-b7e9fbedbcb9.png)

Remaining in the gdb and calling the balance routine manually I am getting this state of the forest.
![image](https://user-images.githubusercontent.com/9045578/53910536-56405d00-4054-11e9-9ffc-1eb49e5d5d4c.png)

which is surprisingly balanced. 

P.S> I have used the aforementioned sequence of grid adaptivity and mesh creation in numerous different types of grids succefully ( like multiple trees lined in x direction) but surprisingly I have come across this bug with this specific connectivity (multiple trees in x and y direction). 
data-panda commented 5 years ago

Ok, So got to workaround this with this construct

/* Destroy Mesh and Ghost */
p4est_mesh_destroy(mesh); // since the morphology is going to change now
p4est_ghost_destroy (ghost);

/* Adapt the Grid */
p4est_refine_ext (p4est, 0, MaxL, refine_condition_voft, NULL,replace_quads);
p4est_coarsen_ext(p4est, 0, 0, coarsen_condition_voft, NULL,replace_quads);
p4est_balance_ext(p4est, P4EST_CONNECT_CORNER, NULL, replace_quads);

/* Check if the Forest is Balanced, if not then redo Balance */
int check = 999;
int balance_count = 0; 
check = p4est_is_balanced(p4est, P4EST_CONNECT_CORNER);
while(!check)
{
     p4est_balance_ext(p4est, P4EST_CONNECT_CORNER, NULL, replace_quads);
     check = p4est_is_balanced(p4est, P4EST_CONNECT_CORNER);
     balance_count++;
}
if(balance_count) P4EST_GLOBAL_PRODUCTIONF("Balance Operations are %4d", (balance_count));

/* Recreate the Ghost */
ghost = p4est_ghost_new(p4est, P4EST_CONNECT_FULL);

/* Recreate the Mesh */
mesh = p4est_mesh_new_ext(p4est, ghost, 0, 1, P4EST_CONNECT_FACE);

Doing this, I have few concerns

  1. since there is a chance of doing the balance operation multiple times does the replace quads function should be such that it is safe for a recursive operation.

  2. As the adaptive front aproached the tree intersection there were TEN consecutive timesteps in which the balance api could not generate a 2:1 balanced tree.

image

tisaac commented 5 years ago

Hi @data-panda:

Thanks for bringing this to our attention.

  1. What version of p4est are you using?

  2. Which connectivity is this? One in the library or one you wrote yourself?

  3. Since you've got access to pre- and post-balance trees in gdb, could you save the pre-balance tree to a .p4p file so that we could test for ourselves?

data-panda commented 5 years ago

Hi @tisaac

  1. for the testing on my own workstation, its the develop branch which was updated and installed some few weeks ago (not older than 1 month). For one supercomputer/cluster run its the release 2.0, and for another, it is 2.2. I could reproduce this in every installation.
  2. It's my own connectivity. To start with it is a simple 2 by 3 tree connectivity (2 in the x-direction and 3 in the y-direction) with no periodicity.
  3. I will try to fetch you the forest dump file through which the problem can be reproduced on your side.

Thanks, Aniruddha

data-panda commented 5 years ago

Hi Tobin and Carsten,

I am back with the files that could easily allow you to reproduce the error. The files can be found in the shared folder link mentioned below.

https://my.pcloud.com/publink/show?code=kZK7fq7ZLgAPXgsRtgfFJ2YpqqYoXb035oPV

It consists of 4 files

  1. bug_testing.c -> adapted from p4est_step1.c allows you to load a dump/p4p forest file through command line. You could compile it as mpicc -g3 -O0 bug_testing.c -lm -lsc -lp4est and run it as ./a.out <p4p file name>

  2. forest-after-refinement.p4p this file should be used as input since this is the forest emanating after the refinement and coarsening step. As you can see in the figure, it is unbalanced image

  3. forest-after-first-balance.p4p this is the forest after the first BALANCE operation still resulting into an UNBALANCED forest. The above code(mentioned in step1) will write the vtk's correspoding to bullet 2 and bullet 3 for you to see and inspect where the bug lies. image

  4. forest-after-second-balance.p4p this is the forest if you call the BALANCE operation once again finally resulting into a BALANCED forest. This state of the forest will not be outputed by the code but you can just manually call the balance api and dump the resulting forest vtk to see for yourself.

P.S -> The action happens at the tree boundary between 4 and 5.

Let me know if you are able to reproduce it on your side. I have few more snapshots of p4p files for all the 10 consecutive time steps where the problem occurs.

With thanks Aniruddha

tisaac commented 5 years ago

Thanks, Amiruddha. I can look when, I get to a computer, but before then, could you share the snippet of code where you fine the connectivity?

On March 7, 2019 5:50:06 AM EST, Amiruddha Panda notifications@github.com wrote:

Hi Tobin and Carsten,

I am back with the files that could easily allow you to reproduce the error. The files can be found in the shared folder link mentioned below.

https://my.pcloud.com/publink/show?code=kZK7fq7ZLgAPXgsRtgfFJ2YpqqYoXb035oPV

It consists of 4 files

  1. bug_testing.c -> adapted from p4est_step1.c allows you to load a dump/p4p forest file through command line. You could compile it as mpicc -g3 -O0 bug_testing.c -lm -lsc -lp4est and run it as ./a.out

  2. forest-after-refinement.p4p this file should be used as input since this is the forest emanating after the refinement and coarsening step. As you can see in the figure, it is unbalanced image

  3. forest-after-first-balance.p4p this is the forest after the first BALANCE operation still resulting into an UNBALANCED forest. The above code(mentioned in step1) will write the vtk's correspoding to bullet 2 and bullet 3 for you to see and inspect where the bug lies. image

  4. forest-after-second-balance.p4p this is the forest if you call the BALANCE operation once again finally resulting into a BALANCED forest. This state of the forest will not be outputed by the code but you can just manually call the balance api and dump the resulting forest vtk to see for yourself.

With thanks Aniruddha

data-panda commented 5 years ago

Well, here it is. It has been written to create a domain of custom root cells in x and y direction. So if it is of help to others, they can use it as well.

#define cube_x 2
#define cube_y 3

p4est_connectivity_t *
connectivity_custom_squares(void)
{
  // cube-x is the number of root cells in x direction
  // cube_y is the number of root cells in y direction
  lr a = 0.058010186501486592; //! one root cell box length

  const p4est_topidx_t num_vertices = (cube_x + 1) * (cube_y + 1);
  const p4est_topidx_t num_trees = cube_x * cube_y;
  const p4est_topidx_t num_ett = 0;
  const p4est_topidx_t num_ctt = 0;

  double vertices[num_vertices * 3];

  int vt = 0;
  for(int j = 0; j<(cube_y+1); j++)
  {
    for(int i = 0; i<(cube_x+1); i++)
    {
      vertices[(vt*3)+0] = i * a;
      vertices[(vt*3)+1] = j * a;
      vertices[(vt*3)+2] = 0.; // 2d connectivity
      vt++;
    }
  }

  p4est_topidx_t tree_to_vertex[num_trees * P4EST_CHILDREN];
  for(int i = 0; i < num_trees; i++)
  {
    tree_to_vertex[(i*4)+0]   = (i + (i/cube_x));
    tree_to_vertex[(i*4)+1] = tree_to_vertex[(i*4)+0] + 1;
    tree_to_vertex[(i*4)+2] = (i%cube_x) + (cube_x * (i/cube_x + 1)) + ((i/cube_x) + 1);
    tree_to_vertex[(i*4)+3] = tree_to_vertex[(i*4)+2] + 1;
  }

  p4est_topidx_t tree_to_tree[num_trees * P4EST_FACES];
  for(int i = 0; i < num_trees; i++)
  {
    tree_to_tree[(i*4)+0] = (i % cube_x) ? (i-1) : i;
    tree_to_tree[(i*4)+1] = ((i % cube_x) == (cube_x-1)) ? i : (i+1);
    tree_to_tree[(i*4)+2] = (i/cube_x) ? (i - cube_x) : i;
    tree_to_tree[(i*4)+3] = ((i/cube_x) == (cube_y -1)) ? i : (i+cube_x);
  }

  int8_t tree_to_face[num_trees * P4EST_FACES];
  for(int i = 0; i < num_trees; i++)
  {
    tree_to_face[(i*4)+0] = (i % cube_x) ? 1 : 0;
    tree_to_face[(i*4)+1] = ((i % cube_x) == (cube_x-1)) ? 1 : 0;
    tree_to_face[(i*4)+2] = (i/cube_x) ? 3 : 2;
    tree_to_face[(i*4)+3] = ((i/cube_x) == (cube_y -1)) ? 3 : 2;
  }

  return p4est_connectivity_new_copy (num_vertices, num_trees, 0,
                                      vertices, tree_to_vertex,
                                      tree_to_tree, tree_to_face,
                                      NULL, &num_ctt, NULL, NULL);
}
tisaac commented 5 years ago

I strongly suspect the problem is that there is no tree_to_corner information in your connectivity. Try running p4est_connectivity_complete() on it to add that information and using the resulting connectivity in your forest.

data-panda commented 5 years ago

Hi Tobin,

I can confirm that indeed that seems to solve the problem. Calling p4est_connectivity_complete(conn) was all I was missing.

Thanks !