Closed jlippuner closed 4 years ago
Thanks for the report. We are aware that ran2()
is not a concurrent PRNG. It can be compatible with MPI e.g. when used as in pgen/turb.cpp
, but in that particular case, each rank iterates over the same global sequence produced by the same rseed
(a global variable defined on each MPI rank).
We should add an explicit trap to prevent this routine from being called inside a threaded region, and document it in the code and Wiki.
And eventually a thread-safe PRNG (Mersenne twister?) should be added to the code.
Std:: already have thread-safe RNGs you can do
std::mt19937 rng;
std::uniform_real_distribution<double> ran2(0.0, 1.0);
rng.seed(something_uint);
and sample numbers as value=ran2(rng)
@jlippuner can you post your modified input file and exact configure/run commands?
In terms of preventing runtime errors for those who call ran2()
within their problem generator, I believe that we could simply make the 3x static variables in the function threadprivate
. Or we could put a #pragma omp critical
around the body of the function. Thoughts @jmstone ?
@felker Thanks for your quick response and I apologize for my delayed one. Here is my config:
./configure.py --prob kh -omp -hdf5 --include /usr/include/hdf5/serial/ --lib_path /usr/lib/x86_64-linux-gnu/hdf5/serial/
Your Athena++ distribution has now been configured with the following options:
Problem generator: kh
Coordinate system: cartesian
Equation of state: adiabatic
Riemann solver: hllc
Magnetic fields: OFF
Number of scalars: 0
Special relativity: OFF
General relativity: OFF
Frame transformations: OFF
Self-Gravity: OFF
Super-Time-Stepping: OFF
Shearing Box BCs: OFF
Debug flags: OFF
Code coverage flags: OFF
Linker flags: -L/usr/lib/x86_64-linux-gnu/hdf5/serial/ -lhdf5
Floating-point precision: double
Number of ghost cells: 2
MPI parallelism: OFF
OpenMP parallelism: ON
FFT: OFF
HDF5 output: ON
HDF5 precision: single
Compiler: g++
Compilation command: g++ -O3 -std=c++11 -fopenmp -I/usr/include/hdf5/serial/
My modified athinput.kh-slip
is:
<comment>
problem = Kelvin-Helmholtz instability
reference =
configure = --prob=kh
<job>
problem_id = kh-slip_8 # problem ID: basename of output filenames
<output1>
file_type = hdf5 # Binary data dump
variable = prim # variables to be output
dt = 0.1 # time increment between outputs
<time>
cfl_number = 0.4 # The Courant, Friedrichs, & Lewy (CFL) Number
nlim = 100000 # cycle limit
tlim = 5.0 # time limit
integrator = vl2 # time integration algorithm
xorder = 2 # order of spatial reconstruction
ncycle_out = 1 # interval for stdout summary info
<mesh>
nx1 = 128 # Number of zones in X1-direction
x1min = -1.0 # minimum value of X1
x1max = 1.0 # maximum value of X1
ix1_bc = periodic # inner-X1 boundary flag
ox1_bc = periodic # inner-X1 boundary flag
nx2 = 64 # Number of zones in X2-direction
x2min = -0.5 # minimum value of X2
x2max = 0.5 # maximum value of X2
ix2_bc = periodic # inner-X2 boundary flag
ox2_bc = periodic # inner-X2 boundary flag
nx3 = 8 # Number of zones in X3-direction
x3min = -0.125 # minimum value of X3
x3max = 0.125 # maximum value of X3
ix3_bc = periodic # inner-X3 boundary flag
ox3_bc = periodic # inner-X3 boundary flag
refinement = adaptive
numlevel = 3
deref_count = 5
num_threads = 48
<meshblock>
nx1 = 8
nx2 = 8
nx3 = 8
<hydro>
iso_sound_speed = 1.0
gamma = 1.4 # gamma = C_p/C_v
<problem>
iprob = 1
amp = 0.2
drat = 2.0
vflow = 0.5
thr = 0.5
Here is the output I get:
./athena -i athinput.kh-slip
### Warning in Mesh::Initialize
The number of MeshBlocks increased more than twice during initialization.
More computing power than you expected may be required.
### Warning in Mesh::Initialize
The number of MeshBlocks increased more than twice during initialization.
More computing power than you expected may be required.
### Warning in Mesh::Initialize
The number of MeshBlocks increased more than twice during initialization.
More computing power than you expected may be required.
Setup complete, entering main loop...
cycle=0 time=0.0000000000000000e+00 dt=6.3237907014130560e-04
cycle=1 time=6.3237907014130560e-04 dt=6.3204117213331874e-04
Terminating on Terminate signal
time=6.3237907014130560e-04 cycle=1
tlim=5.0000000000000000e+00 nlim=100000
Number of MeshBlocks = 2368; 2240 created, 0 destroyed during this simulation.
zone-cycles = 1212416
cpu time used = 1.5651121000000000e+01
zone-cycles/cpu_second = 7.7465122146841750e+04
omp wtime used = 3.3088790100009646e-01
zone-cycles/omp_wsecond = 3.6641291396134989e+06
With this small change, it works (this is a naive solution of just turning off OpenMP when calling the problem generator):
diff --git a/src/mesh/mesh.cpp b/src/mesh/mesh.cpp
index c97ce9e..7d0ce0a 100644
--- a/src/mesh/mesh.cpp
+++ b/src/mesh/mesh.cpp
@@ -1339,7 +1339,7 @@ void Mesh::Initialize(int res_flag, ParameterInput *pin) {
}
if (res_flag == 0) {
-#pragma omp parallel for num_threads(nthreads)
+//#pragma omp parallel for num_threads(nthreads)
for (int i=0; i<nmb; ++i) {
MeshBlock *pmb = pmb_array[i];
pmb->ProblemGenerator(pin);
Fixed upstream in the private repository for the next public release. Here is the patch that you can apply locally:
diff --git a/src/utils/ran2.cpp b/src/utils/ran2.cpp
index defedf1..18b99b6 100644
--- a/src/utils/ran2.cpp
+++ b/src/utils/ran2.cpp
@@ -45,6 +45,7 @@ double ran2(std::int64_t *idum) {
static std::int64_t idum2=123456789;
static std::int64_t iy=0;
static std::int64_t iv[NTAB];
+#pragma omp threadprivate(iy,iv,idum2)
double temp;
if (*idum <= 0) { // Initialize
I like this solution. Thanks Kyle!
On Oct 30, 2019, at 11:24 AM, Kyle Gerard Felker notifications@github.com wrote:
Fixed upstream in the private repository for the next public release. Here is the patch that you can apply locally:
diff --git a/src/utils/ran2.cpp b/src/utils/ran2.cpp index defedf1..18b99b6 100644 --- a/src/utils/ran2.cpp +++ b/src/utils/ran2.cpp @@ -45,6 +45,7 @@ double ran2(std::int64_t *idum) { static std::int64_t idum2=123456789; static std::int64_t iy=0; static std::int64_t iv[NTAB]; +#pragma omp threadprivate(iy,iv,idum2) double temp;
if (*idum <= 0) { // Initialize — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PrincetonUniversity/athena-public-version/issues/28?email_source=notifications&email_token=ABB3XTUMVSSDMG3PT452YOTQRGRMHA5CNFSM4JEUDBWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECUTYRA#issuecomment-547961924, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABB3XTVHP6XRXSQML5H7FKTQRGRMHANCNFSM4JEUDBWA.
This solution works for me. Thanks!
But it just occurred to me: are the random number generators in each thread now independent of each other or will they all produce the same sequence of numbers?
The PRNG is seeded with std::int64_t iseed = -1 - gid;
based on the global index of each MeshBlock in kh.cpp
, and the Athena++ threading strategy is over MeshBlocks on a single MPI rank, so each thread should have separate sequences of numbers for each of its MeshBlocks.
Great!
Actually, using threadprivate is extremely tricky and is normally avoided. For example, in the manual https://www.openmp.org/wp-content/uploads/OpenMP4.0.0.pdf you can find "The address of a threadprivate variable is not an address constant" - the program can create or destroy these variables depending on the number of currently running threads. These are not static variables in an ordinary sense. And if the threads are dynamic it is explicitly prohibited to use threadprivate (page 151 line 26).
Why use this particular RNG implementation in the problem generator? Just copy one from GSL or use standard library.
For @jlippuner's particular case, all calls to ran2()
occur in the same parallel construct, and the omp parallel for
uses the implementation-defined default scheduling type (probably static
), so I dont think it is an issue here.
I agree with you that a different PRNG should be used , in general. But we leave that to application users with that specific, more complex need. Here, I just wanted to ensure that the simple problem generator compiles and runs correctly with OpenMP.
I was trying to run the Kelvin-Helmholtz problem with OpenMP (
kh.cpp
with a modifiedathinput.kh-slip
). The simulation would immediately terminate (Terminating with Terminate Signal, or something like that). Then I compiled with the-debug
flag and got a segfault before any other output. I traced the segfault to the random number generator (line 72 inran2.cpp
). I suspect that the random number generator is not thread-safe.A quick fix is to remove the
#pragma omp
on line 1342 ofmesh.cpp
, i.e. the following fixes the issue for me: