SCOREC / core

parallel finite element unstructured meshes
Other
183 stars 62 forks source link

MA: assert fails when snapping is on with matched faces #73

Open mrasquin opened 7 years ago

mrasquin commented 7 years ago

Hello,

I tried to apply a uniform refinement with snapping on a pseudo 3 way periodic subchannel with Chef and the code stops in the following assertion: ... MeshAdapt: after refinement: checked layer quality in 0.138162 seconds: 0 unsafe elements chef: /home/mrasquin/develop-scorec/core/src/ma/maAdapt.cc:261: long int ma::markEntities(ma::Adapt*, int, ma::Predicate&, int, int): Assertion `! getFlag(a,e,trueFlag)' failed. signal 6 caught by pcu

A uniform refinement runs fine when snapping is set to 0. I am using core version d0ad7a30ca602c093f7b1c787f20cb2ab6d0b6fc but this case used to work fine with core-sim some time ago. Please also note that I have been able to reproduce the issue both with a parasolid and geomsim model. Not sure it is related but the associated mesh is periodic in the three direction (Once fixed, this case would be a good candidate for the regression test base for that reason).

I have uploaded an archive at both CU and RPI: see /users/mrasquin/subchannel_3way.tgz. You can reproduce the issue by running:

tar -xzf subchannel_3way.tgz cd subchannel_3way/2-A0-geomsim/chef/8-1-Chef-UR/ mpirun -np 8 path/to/chef Note also that the same assertion will be positive if Chef runs with a single mpi (numSplit set to 1 in adapt.inp in this case).

Thanks,

Michel

mrasquin commented 7 years ago

I should probably clarify what I mean with "periodic subchannel". My mesh includes three pairs of matched faces and my model attribute file also includes "periodic slave" boundary conditions applied on these faces.

Note that I could confirm the issue with the assertion mentioned above is only present when snapping is on AND the mesh includes matched faces. If I rebuild a mesh without matched faces, snapping will not run produce the same assertion issue. So the combination of matched faces and snapping is what triggers the assertion in maAdapt.cc:261

bgranzow commented 7 years ago

Perhaps @mortezah would be the correct person to investigate this issue. @mrasquin Do you have a SHA for core-sim where this feature worked as you expected?

mrasquin commented 7 years ago

Hi Brian,

The SHA of the core-sim version which still runs through this assert without stopping is: cd0377ca741357acdc0c49b319a21b4fc210d552. Note that the instruction "assert( ! getFlag(a,e,trueFlag));" in ma/maAdapt.cc:261 was already present in core-sim so the difference and the issue is upstream this line in the code.

I also diffed ma/maAdapt.cc from my recent version of core and my older version of core-sim and two additional routines have been implemented since then in core/ma/maAdapt.cc. These are named setFlagMatched() and clearFlagMatched(). See the attached screenshot. diff-core-coresim

However, adding a return at the top of these two new routines just for the sake of testing does not prevent the assert to stop the code. That said, they have been many other changes in ma/ so it is hard to tell where I am "loosing a flag" caught by this assert (the "why" is clearly related to matched faces).

I also commented out the assert in ma/maAdapt.cc:261 and Chef could then produce apparently a valid mesh. I have not ran phasta on this mesh yet but this solution does not look safe since it may open the door to other issues which would not be captured anymore by this assert. Could someone explain what the idea behind this assert and if I can safely comment it out for a matched mesh?

Thanks for your help

PS1: note that if you plan to run the subchannel_3way test case I pointed you too in my first email with core-sim, you will have to convert again the simmetrix mesh to mds since the current mds version used by recent core does not match the older version required by core-sim.

PS2: I think the case provided in subchannel_3way would be a great addition to the nightly regression test base. I could make this test even more complete by testing additional functionalities (tetrahedronization, bubble initialization, solution migration, etc).

bgranzow commented 7 years ago

I agree with the apparent need for regression tests for features that appear to be slipping through the cracks. I would just stipulate that the proposed regression tests should take less than a second to run (for most cases). I haven't actually had a chance to run the subchannel_3way test. My knowledge of how mesh adapt works is pretty limited and my knowledge of how matching works is non-existent. If anyone else at SCOREC knows anything about these it would be real neat-o if you could help out.

One (potentially painful...) way to try debug this would be to use git bisect to find the offending commit.

ibaned commented 7 years ago

You're not losing a flag, you have too many flags. That check is failing because a flag exists on an entity that should not be there. However its a very generic check and gets used in several places, so it would help if you post a full stack trace.

bgranzow commented 7 years ago

Here is the stack trace I get when running the example @mrasquin pointed me to in serial. (i.e. modifying the parameter splitFactor to 1 in adapt.inp)

#0  0x00007fffe2edff15 in raise () from /lib/libc.so.6
#1  0x00007fffe2ee2d20 in abort () from /lib/libc.so.6
#2  0x00007fffe2ed90a1 in __assert_fail () from /lib/libc.so.6
#3  0x00007fffef80d4d2 in ma::markEntities(ma::Adapt*, int, ma::Predicate&, int, int) () at /lore/granzb/core/ma/maAdapt.cc:261
#4  0x00007fffef820e07 in ma::snapOneRound(ma::Adapt*, apf::MeshTag*, bool, long&) () at /lore/granzb/core/ma/maSnap.cc:234
#5  0x00007fffef820fb3 in ma::snapTaggedVerts(ma::Adapt*, apf::MeshTag*) () at /lore/granzb/core/ma/maSnap.cc:253
#6  0x00007fffef82103b in ma::snap(ma::Adapt*) () at /lore/granzb/core/ma/maSnap.cc:274
#7  0x00007fffef80c23a in ma::adapt(ma::Input*) () from /lore/granzb/core/install/lib/libma.so
#8  0x00007ffff750bd12 in ph::adapt(ph::Input&, apf::Mesh2*) () at /lore/granzb/core/phasta/phAdapt.cc:134
#9  0x00007ffff74fc45c in chef::bake(gmi_model*&, apf::Mesh2*&, ph::Input&, ph::Output&) () at /lore/granzb/core/phasta/phCook.cc:117
#10 0x00007ffff74fc6d1 in chef::cook(gmi_model*&, apf::Mesh2*&) () at /lore/granzb/core/phasta/phCook.cc:253
#11 0x0000000000412694 in main () at /lore/granzb/core/phasta/chef.cc:34
mrasquin commented 7 years ago

Is there any update on this bug? We are currently forced to use an old version of core-sim which does not run into this issue with matched faces but this old version has other limitation including a crash during adaptation of a specific configuration (probably more to come on this later). We would like to avoid debugging an old version of MeshAdapt and switch again to MeshAdapt trunk as soon as possible. Please let me know if I can provide other information in order to help further with this ticket. Thank you for your help. Best, Michel

mortezah commented 7 years ago

I spent some (limited) time on this issue. Snapping for matched meshes causes a lot of inconsistencies for the SNAP and DONT_SNAP flags. More specifically, before any snapping is done the vertices on the matched boundary and their matches have the same snap-related flags. After only one round of snapping that condition gets violated. That is you can find a vertex whose snap-related flags are no longer the same as the snap-related flags of its matched vertex.

That being said, I don't think commenting out the assert, as mentioned by @mrasquin, is safe.

I will continue to work on this as my time permits.