Fix hybrid OpenMP/MPI parallelisation

gandalfcode / gandalf

GANDALF (Graphical Astrophysics code for N-body Dynamics And Lagrangian Fluids)

GNU General Public License v2.0

44 stars 12 forks source link

Fix hybrid OpenMP/MPI parallelisation #31

Closed dhubber closed 7 years ago

dhubber commented 8 years ago

Track down and debug any errors relating to using hybrid MPI and OpenMP parallelisation

giovanni-rosotti commented 8 years ago

See recent pull request #38 .

I also get a very weird behaviour with the freefall test when compiling in hybrid mode. With the default values (Nhydro=8000,cubic lattice), normally the actual number of particle is around 7800. But if compiling in hybrid mode, it is much smaller (less than 2000), at least on my machine. However the initial condition generation does not have any hybrid OpenMP/MPI code. Therefore I suspect it is actually a bug in the routine itself. If I use random positions then everything is fine.

giovanni-rosotti commented 8 years ago

I can now update on this bug. It is actually nothing to do with hybrid parallelization, but just with openMP when using the intel compiler. The problem is the shared variable box in the function AddCubicLattice; if I try to otuput the value of boxmin inside the omp loop I get garbage. Apparently the intel compiler doesn't copy properly the arrays inside the structure DomainBox. I'm not sure what the standard says, but to me it seems a genuine bug in the Intel compiler... If this is true, the question is how we get around it (and if the same bug might bite us also elsewhere).

giovanni-rosotti commented 8 years ago

I opened a thread on the Intel forum for this issue: https://software.intel.com/en-us/forums/intel-c-compiler/topic/600458

dhubber commented 8 years ago

Okay, that is a bit worrying. I guess for the small parallel loops (like this one in AddCubicLattice) we can just remove the parallel statement. For larger CPU-intensive loops, it's a bit more worrying yes. What happens when you pass the object as an argument exactly? I'm guessing it makes a copy. In which case, what would happen if we explicitly wrote a copy constructor for it? I noticed on your Intel issue you said it's fine if we pass by reference. Presumably it's better to pass by reference anyway (assuming it's a read-only object and gives no performance penalty).

giovanni-rosotti commented 8 years ago

Passing by reference is fine, yes. We're actually doing it already in many other places in the code, which is the reason why it runs fine if you just disable that loop. As far as I can tell by going through the code (if you could do the same exercise it would be good), the only other place where we are passing directly a DomainBox is in PeriodicGhosts::CheckBoundaries. But if I'm not mistaken that's not used anymore; it has actually been moved to SphIntegration, and that passes by reference. So as long as we remove the parallels in addcubiclattice and addhexlattice/pass by reference, we should be fine. I'd like to know what Intel thinks though...

dhubber commented 7 years ago

Any update on this issue, particularly on whether the latest versions of the Intel compiler have now fixed this bug? And if not, then I guess removing those parallels (if we've not already done it) should still do the trick? I'm happy with removing the parallels if Intel has not yet done anything since we can't wait on them (if they've not done it by now, no reason to believe they will do it any time soon!)

giovanni-rosotti commented 7 years ago

The newest version of the compiler is fine, as it was at the time. Some versions of the compiler are buggy, and they will always remain buggy. That being said, we already modified the code back then to pass by reference and circumvent the problem. Closing the issue as this was solved long ago