cmbant / CosmoMC

MCMC parameter sampling code
https://cosmologist.info/cosmomc/
82 stars 68 forks source link

This fixes memory problem on some machines #18

Closed mraveri closed 5 years ago

mraveri commented 5 years ago

I had a problem with a mysterious segfault on a cluster. I traced it back with a debugger to these two variables that were declared as private. Since their shape is resolved at run time they are not guaranteed to be copied in the OMP section. Firstprivate forces the initial copy and solves the problem.

cmbant commented 5 years ago

They are defined inside the openmp loop, so private should be correct? Seems like a compiler bug, which was it?

mraveri commented 5 years ago

I am getting the segfault with ifort 16 and 18 and the problem is machine dependent (on my laptop does not happen, on cluster crashes).

I caught this by debugging with ddt and my interpretation is:

making the variables firstprivate solves the problem as they get copied (with the shape) in the omp section.

Il giorno 28 feb 2019, alle ore 17:40, Antony Lewis notifications@github.com ha scritto:

They are defined inside the openmp loop, so private should be correct? Seems like a compiler bug, which was it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cmbant/CosmoMC/pull/18#issuecomment-468483915, or mute the thread https://github.com/notifications/unsubscribe-auth/AIX2RH76-koL1llWKFYc_1md9IE7Tcitks5vSGjWgaJpZM4bXw_t.

cmbant commented 5 years ago

OK thanks. 18.0.1 or higher? Ifort does have a general bug with memory allocation inside openmp loops, but I've not seem any issue like you report for stack allocated arrays for ifort 18.0.1 or 19. As you say it may be CPU dependent, and I guess the patch is otherwise harmless (I would however be inclined to add a comment that is it a bug workaround, since otherwise unobvious why you would have firstprivate there [copying in undefined values]).

mraveri commented 5 years ago

Il giorno 01 mar 2019, alle ore 05:23, Antony Lewis notifications@github.com ha scritto:

OK thanks. 18.0.1 or higher?

I am using ifort 18.0.2.

Ifort does have a general bug with memory allocation inside openmp loops, but I've not seem any issue like you report for stack allocated arrays for ifort 18.0.1 or 19. As you say it may be CPU dependent, and I guess the patch is otherwise harmless (I would however be inclined to add a comment that is it a bug workaround, since otherwise unobvious why you would have firstprivate there [copying in undefined values]).

I agree, this is definitely architecture dependent as it does not happen on my laptop (with same ifort) and another cluster but with an older ifort. The workaround is generally harmless, should I just add a comment before the omp sections?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

cmbant commented 5 years ago

Sounds good thanks.