Testing adaptivity is consistent in serial, parallel and \w load balancing

gnikit commented 4 years ago

Hi, I recently came across some strange behaviour with our load balancer in FETCH, which is identical to Fluidity's + a bit of code for radiation. I am pretty confident that there is nothing wrong with our/your sends and receives, however it did raise the question how would one go about testing in Fluidity that the load balancer produces consistent results for a serial, a parallel and a load balanced run, i.e. have similar numbers of degrees of freedom.

I am not very familiar with Fluidity in general, (setting up all the fields, solvers, detectors etc.) or which quantities change when run in parallel/load balancing. I just grabbed one of the existing tests with load balancing and tweaked it a bit, hence I am not really surprised that my attempt was not fully successful.

Having said that, the serial, parallel and load balancing runs have adapted quite differently, which is not what I would expect. Does any one have any ideas why that is, or a more suitable way to test adaptivity across serial and parallel with load balancing on and off?

I am attaching the test case that I made. FYI it is based on detectors_parallel_adapt.flml load_balancer_consistent.zip

stephankramer commented 4 years ago

A couple of remarks, hopefully this helps you further.

You compare between serial, parallel and "load balanced". That confused me a little, but diffing the .flml of parallel and "load balanced", I suspect that you are expecting the "parallel" setup not to use a load balancer. This is not the case: parallel adaptivity in fluidity is always load balanced. The reason for that has to do with how the parallel adaptivity algorithm works - which might also explain your observation:

Having said that, the serial, parallel and load balancing runs have adapted quite differently, which is not what I would expect.

First I should mention that mesh adaptivity isn't particularly deterministic even in serial, in the sense that it's not robust for floating point round off, or reordering. It's an iterative process that improves the mesh until it meets the specified quality criteria everywhere. There are many possible meshes that will meet the same criteria. Also the criteria, or desired edge lengths, will not be met everywhere precisely in particular when additional constraints are imposed, and the algorithm will simply stop when it sees no way to further improve the mesh (without making it worse in other places). So when you say consistent, you have to be a bit more precise about what you mean.

The parallel algorithm is unfortunately even less predictable, and when configured incorrectly can produce adapted meshes that locally do not meet the quality criteria at all. The way the algorithm works is that each process adapts the local part of the mesh independently. However if a process would adapt the mesh right at the local domain boundary, it would also change the mesh in the neighbouring domain. For this reason strip of elements (roughly corresponding to the halos) is locked during this adaptivity process. This means that elements near the local domain boundaries do not get adapted at all. To remedy this, the adaptivity process is followed by a redistribution stage where elements of bad quality (which will be the ones that were locked) get assigned a high edge-cut value, which should mean that in the new redistribution the load balancer tries to avoid having these near the new local domain boundaries. Thus in a new stage of adaptivity, the previously unadapted elements have hopefully been moved into the interior of a local domain so that they can now be freely adapted. However in general it's not possible for the domain balancer to move all the locked elements into the interior at once. So typically we will need another iteration of this process: again assigning high edge cut to bad quality elements, followed by another stage of local adaptivity, etc. Finally when adaptivity thinks it's done, the last round of adaptivity is followed by a final redistribution stage but now actually taking load balancing into account (which didn't have much priority in the previous redistributions inbetween the adaptivity stages).

Of the top of my head, the default of the number of these stages (iterations) is three - which might be enough in a trivial domain, but often isn't. So there is a whole host of extra options where you can configure it to do extra iterations and/or to only stop when it reaches a minimum element quality criterion everywhere. Unfortunately this means that configuring adaptivity to perform well can be a bit of an art form, and may depend a lot on the type of problem you are solving. In particular if there are additional constraints, in the form of boundaries that need to be preserved, region ids, internal boundaries, etc. it can be hard for the algorithm to produce a good quality mesh at all. It will also depend on how much your dynamics changes in between adapts. To debug this it's worth taking a look at the minimum element quality that's being printed in the fluidity log, and seeing whether it indeed gets stuck. There are also diagnostic fields you can output to view the mesh quality everywhere.

One final comment, if you just want to look at the total number of nodes in the produced mesh you should look at the .stat-file output which can be read with the python stat_parser.

gnikit commented 4 years ago

Thank you @stephankramer for your detailed response, I appreciate you shedding some light on the internals of load balancing + adaptivity algorithms, things are much clearer now. Given that what I was trying to do 1) does not make much sense and 2) would be very fiddly to accomplish I think I will close this issue.

FluidityProject / fluidity

Testing adaptivity is consistent in serial, parallel and \w load balancing #263