gandalfcode / gandalf

GANDALF (Graphical Astrophysics code for N-body Dynamics And Lagrangian Fluids)
GNU General Public License v2.0
44 stars 12 forks source link

Bug with intel compiler, openMP and gravity #85

Open giovanni-rosotti opened 7 years ago

giovanni-rosotti commented 7 years ago

Very serious bug. If I just take the master branch (but with some very simple modifications to make it compile, see 005df4c in branch intel_bug), compile with intel (used v11.1) and run a gravity test (e.g. freefall or bossbodenheimer) I get a crash straight away due to this assertion failing. Tried to debug without success. It seems a parallelization error because if I run with only 1 thread it doesn't crash. Thankfully the problem doesn't come up with a newer version of the Intel compiler (v15) or with gcc, but it makes me wonder if it's a compiler problem or a problem in our code.

dhubber commented 7 years ago

I recently (in the last week or so) had that problem also, but it was when I was debugging the meshless so assumed the problem was on that end. But with gradh or gravity-only, then it's an issue. It's not a critical assertion, in the sense that particles that fail that assertion are wrong. It's just the tree-walk should calculate smoothed neighbours and direct-sum gravity neighbours differently so it means there is perhaps something wrong there. However, the fact there are differences between serial and parallel is definitely a worry (again). We'll have to investigate to see if the assertion is only just failing (due to floating point round-off) or properly failing due to wrong sorting or even NaNs.

rbooth200 commented 7 years ago

I think it is a critical assertion because the particles are unsoftened in this function. Giovanni checked yesterday that it's neither NaN's or due to round off, so it really is an issue.

dhubber commented 7 years ago

Ahh ok, yes. Sorry, I was thinking the other way around (not reading the code properly oops) where we were instead in the smoothed gravity function but computing for direct-gravity particles. But it's the opposite, so yes, it is a critical assertion!!

giovanni-rosotti commented 7 years ago

Yes, that's right, it's a serious issue. Which compiler were you using when you found the problem in the meshless? Was it intel 11.1 or another version?

dhubber commented 7 years ago

No, just g++ (5.4) on my Mac. But like I said, I was debugging the meshless so was assuming it was a problem with that part. However, if I get that error again I'll post it here with more info if it helps narrow down the issue.