Gridding Algorithm error with refine boxes

rw-anderson commented 7 years ago

User submitted:

I am trying to use class algs::TimeRefinementIntegrator but I am facing an error which may be a bug or a misconfiguration.

I tried to use 4 levels with "REFINE_BOXES" tagging method and SAMRAI returns an error in the gridding algorithm class:

P=0000000:Failed assertion: !d_hierarchy->levelExists(new_ln + 1) || tag_to_finer

But, If I set 3 levels, the simulation runs smoothly. What is more, if I use 4 levels with "REFINE_BOXES" and "GRADIENT_DETECTOR" (without tagging any cell) it also works.

From what I debugged, the problem comes from a checking of the tagging method if using exclusively fixed refinement. In a specific level, the tag_to_finer variable is not filled and the level to refine already exists, so the exception is raised.

I imagine this is a bug in SAMRAI, since there is no sense that the time refinement integrator works for less than 4 levels but not otherwise when using exclusively FMR.

The error is easily reproducible using applications/LinAdv example. The sphere_4levels.2d.input parameter file may be changed to use the following tagging:

StandardTagAndInitialize { tagging_method = "REFINE_BOXES" level_0 { boxes = [ (5, 0) , (24, 19) ] } level_1 { boxes = [ (15, 5) , (44, 34) ] } level_2 { boxes = [ (35, 15) , (84, 64) ] } level_3 { boxes = [ (75, 35) , (164, 104) ] } }

If I am wrong and this is a misconfiguration from my side, please indicate me the problem. Otherwise I hope this bug can be fixed in a proper release.

If you need more information, do not hesitate to ask me.

rw-anderson commented 7 years ago

User submitted, apparently related:

I have a problem when using SAMRAI. I was trying to apply different refinement boxes at different simulation steps. So I set my database for StandardTagAndInitialize as the following:

StandardTagAndInitialize { at_0 { cycle = 10 tag_0 { tagging_method = "REFINE_BOXES" level_0 { boxes = [(21,21,21),(42,42,42)],[(53,53,53),(63,63,63)],[(0,0,0),(10,10,10)],[(0,0,53),(10,10,63)],[(0,53,0),(10,63,10)],[(53,0,0),(63,10,10)],[(53,53,0),(63,63,10)],[(0,53,53),(10,63,63)],[(53,0,53),(63,10,63)] } } }

at_1 {
   cycle = 11
   tag_0 {
      tagging_method = "REFINE_BOXES"
      level_0 {
                boxes = [(21,21,21),(42,42,42)],[(53,53,53),(63,63,63)],[(0,0,0),(10,10,10)],[(0,0,53),(10,10,63)],[(0,53,0),(10,63,10)],[(53,0,0),(63,10,10)],[(53,53,0),(63,63,10)],[(0,53,53),(10,63,63)],[(53,0,53),(63,10,63)]
        }
     level_1 {
                boxes = [(53,53,53),(74,74,74)]

      }
   }
}

}

So, I will expect to get two levels at 10th step and 3 levels at 11th step. And I tried to call gridding_algorithm->regridAllFinerLevels() twice at 10th and 11th step, the code for calling is as follows:

std::vector<int> tag_buffer(patch_hierarchy->getMaxNumberOfLevels());
for (idx_t ln = 0; ln < static_cast<int>(tag_buffer.size()); ++ln) {
  tag_buffer[ln] = 1;
}
gridding_algorithm->regridAllFinerLevels(
  0,
  tag_buffer,
  10,
  0);
tbox::plog << "Newly adapted hierarchy\n";
patch_hierarchy->recursivePrint(tbox::plog, "    ", 1);

gridding_algorithm->regridAllFinerLevels( 0, tag_buffer, 11, 0);

The first call of regridAllFinerLevels works fine and gives me the right mesh configuration, but the second one fails and output the error message as

P=0000000:Program abort called in file ``../../../../SAMRAI/./source/SAMRAI/mesh/GriddingAlgorithm.C'' at line 1802 P=0000000:ERROR MESSAGE: P=0000000:Failed assertion: !d_hierarchy->levelExists(new_ln + 1) || tag_to_finer

It will work normally when I use makeFinerLevel to create level_1 instead, but I'm afraid I will lose the ability of revising the existing mesh by using that. All my code works well when using the adaptive feature with the regridding method of "GRADIENT_DETECTOR", it only breaks down when trying to set mesh manually like this.

nicolasaunai commented 4 years ago

Hi, Any ideas on that issue ? I'm hitting the same assert failure with revision https://github.com/LLNL/SAMRAI/commit/02109e017a0934dbc473a7a8029dad741db0825f Seems to be the exact same issue, hierarchy with 3 prescribed refinement levels works OK, but hitting the assert once doing 4 levels.

It seems it does several regridding though before failing.

My case is 1D, the refinement ratio is fixed and equal to 2. The StandardTagAndInitialize database I create has the following structure :

StandardTagAndInitialize { tagging_method = "REFINE_BOXES" level_0 { boxes = [ (10, ) , (40, ) ] } level_1 { boxes = [ (30, ) , (60,) ] } level_2 { boxes = [ (80,) , (100,) ] } }

I do 10 substeps per "next coarse time step". I am asking to run for 2 coarse steps and I get :

1 calls to advanceLevel with levelNumber==0 2 calls to advanceLevel with levelNumber==1 20 calls to advanceLevel with levelNumber==2 200 calls to advanceLevel with levelNumber==3

I also have 10 calls to regrid(), which seems to occur every 2 level 2 steps when level 3 reaches synchronisation time with level 2 (not sure why 10 regrid ? That's a tangential question I have, I'm not sure to understand how I'm supposed to specify the regrid_interval in refined time stepping mode, samrai seems to choose on its own based on the refinement ratio... )

From the numbers above I get that it should be doing now 1 call to advanceLevel on level 1. So the failure would happen when level 2, 3 reach the sync point with level 1 and level 3 is re-created.

Thanks in advance for any update.

nselliott commented 4 years ago

@nicolasaunai One thing that needs to be addressed is that there really is no reason to call regrid when you have "REFINE_BOXES" turned on for the entire problem, but SAMRAI does do so when the refined timestepping is used. I need to look into this and figure out how to stop it from happening. I think that would eliminate the error you are seeing if we can fix this.

The complicating factor is that regridding does need to be enabled for cases described in the earlier comments when the tagging method changes during the course of the problem, so this might not be a quick fix.

Even so, the regrids that are happening should only be causing meaningless redundant work and not fatal errors, so there is probably a more fundamental bug and we will look into that.

nselliott commented 4 years ago

Update: The assertion is failing on a check on a pointer tag_to_finer that is unused in the method where it occurs. The pointer is null as it is never allocated in cases where "REFINE_BOXES" is used, but this should not be a failure if it is never used. A fix will be coming soon--the unnecessary regrids that I mention above will still happen, but they shouldn't fail with this fix.

nicolasaunai commented 4 years ago

ok I pulled the last revision and tried with 4 levels. Now I don't get that assertion anymore bu I get another one that I can't understand. This happens after a regrid so makes me think it's still possibly related (and 3 levels works ok).

The situation is the following:

My level0 time step is 0.025, level 1 is 2.5e-3, level 2 2.5e-4 and level 3 2.5e-5. The hierarchy arrives at the points where level 3 reaches t=0.005, where level 2 is, and so is level 1. There is a regridding, and I see level 3 is initialized, then level 2 is. Then I see level 1 advance() one step, so it reaches t=0.005+2.5e-3 Level 2 advances 1 step, so reaches t=0.005+2.5e-4 Level 3 advances 1 step... and assertion is raised when applying the time/space interpolation to get ghosts from level 2.

Level3 wants its ghosts filled at t=0.005025 Level2 is supposed to be at t=0.00525 (and it is when I go back in the stack trace at a point where I can investigate the patch datas on that level in the debugger, I see their timestamps at that time).... so you'd think no problem !

But here in RefineTimeTransaction.C:

      TBOX_ASSERT(pd_new->getTime() >= s_time);

I can see that if s_time is indeed at t=0.005025, pd_new has its time stamp stuck at t=0.005.... instead of being at t=0.00525

One possibility is that somehow the patch data pd_new points to the level 2 before regrid, when that level was indeed at t=0.005 :/

LLNL / SAMRAI

Gridding Algorithm error with refine boxes #8