Closed whannah1 closed 6 years ago
@worleyph , in the past, you provided me with a PRNG for this exact same problem. Is it, by chance, better than the one @whannah1 put in this PR? I think you had grabbed it from a fusion code or something.
@mrnorman , sorry for the delay in responding - I'm on personal travel at the moment.
I'll have to read up on this issue and also remind myself what I did earlier (and why). I know that the atmosphere group preferred their own random generator, which was also already either in the code or in their branch, because they were comfortable with it and it returned a vector of results?
From e-mails (November-December 2014):
For the fusion project I work with (particle in cell code) they have a random number
generator that works well as far as I know, and I modified it to be threadsafe and
also "reproducible" when changing process and thread counts. I can ask the PI
whether it is okay to share this if you are interested in trying something totally
different.
then
I've asked permission from the PI of the EPSi project, but they are using an in-lined
open source code any way:
MODULE Ecuyer_random
! L'Ecuyer's 1996 random number generator.
! Fortran version by Alan.Miller @ vic.cmis.csiro.au
! N.B. This version is compatible with Lahey's ELF90
! http://www.ozemail.com.au/~milleraj
! Latest revision - 30 March 1999
though I have made some minor modifications to it for use in a parallel environment.
then
I did a quick search, and the L'Ecuyer random number generators are considered
at least as good as those of Marsaglia (the basis of the fortran instrinsic), so that
is reassuring. This code is pretty old though, but then we are not doing
cryptography, so our requirements are probably fine for 1990s technology ;-).
then
Hi Matt,
From ~worley/ACPI/SVN/ACME/mrnorman/cam-se/perf-sensitivity/ACME/models/atm/cam/src/dynamics/se grab
inidat.F90
random_xgc.F90
then after you create a new case, edit the Macros file to add '-D USE_RANX' , e.g.
CPPDEFS+= -DFORTRANUNDERSCORE -DNO_SHR_VMATH -DNO_R16 -DLINUX -DCNL -DCPRPGI -DUSE_RANX
and then Matt's PR #56 (https://github.com/ACME-Climate/ACME/pull/56).
Matt's instructions are a little different than mine (new_random = .true. ?, other namelist arguments?) but random_xgc.F90 is still in the model, so is probably available to be tried.
@worleyph, is there an advantage of using Tristen's RNG over the Mersenne Twister algorithm?
Note that random_xgc.F90 supports reproducbile random numbers (if use the same seed) in a parallel code, including threading. In particular:
! if ranx is to be executed in a threaded region,
! then each thread must set its own seed, otherwise
! each thread generates the identical sequence
Matt implemented this correctly (at the time), but, if this part of the code has changed over time, you should be aware of this.
My version seeds based on the global column number, so it should be reproducible across different PE layouts.
@whannah1 ,
is there an advantage of using Tristen's RNG over the Mersenne Twister algorithm?
No idea. I was just responding to @mrnorman 's request. When we last ran into the problems with the Fortran intrinsic RNG (PGI in particular), I had access to this code, used in a production plasma microturbulence code. I was familiar with it as I had modified it to make it threadsafe. Other than that, I have no special knowledge. It is already in E3SM however, for whatever that is worth.
It being threadsafe is pretty nice. I dunno if we'll need that, but you never know.
If it doesn't matter to you guys I'd prefer to use this new one. I like how simple the interface is for getting small sets of numbers and I've already been using it for another part of the CRM.
What's involved in making it "thread safe" aside from making sure the seed is the same for each column?
@whannah1 ,
is there an advantage of using Tristen's RNG over the Mersenne Twister algorithm?
No idea. I was just responding to @mrnorman 's request. When we last ran into the problems with the Fortran intrinsic RNG (PGI in particular), I had access to this code, used in a production plasma microturbulence code. I was familiar with it as I had modified it to make it threadsafe. Other than that, I have no special knowledge. It is already in E3SM however, for whatever that is worth.
It shouldn't be too hard to do if we need to. We just need to put locks (i.e., "!$omp critical
")around certain parts of the code to make sure two threads don't grab the same PRN or seed, etc.
My version seeds based on the global column number, so it should be reproducible across different PE layouts.
My connection is going in and out - second time trying to send this.
Just curious, but not enough to read the code :-), how do you generate separate sequences for each column when multiple columns are assigned to a thread (even thread 0)?
It shouldn't be too hard to do if we need to. We just need to put locks (i.e., "!omp critical")around certain parts of the code to make sure two threads don't grab the same PRN or seed, etc.
I provided thread-specific random number sequences, but this also required making sure that the same thread handles a column (for the fusion code, particles) during each timestep. Not hard to do, but eliminates any dynamics OpenMP scheduling algorithms.
Oh I think I get it. So when I set the seed, if there are multiple threads then there's no guarantee that another thread will re-set the seed before I actually get the random sequence I need.
For now that doesn't matter because the CRM has to be run without threading. So I think we're ok.
Would dynamic OMP schedules work efficiently for us? We'd have to have significant load imbalance and probably significantly more work for that to become practical, I think. We should be safe either way.
ERP tests were failing for SP compsets at the COMPARE_base_rest stage. Adding a new Mersenne Twister RNG module to set the CRM perturbations alleviated the issue.
Some rambling details:
In an unrelated effort for implementing a random walk to make the CRM orientation evolve over time I stumbled onto a concerning discrepancy between the behavior of the fortran intrinsic random number generator (RNG) when compiling with gfortran and pgi. Off-line results from the pgi compiler showed that the random numbers were scaled by the seed value, and so small seed values resulted in near zero random values. The solution is to build in a RNG so that the results are consistent across compilers. Since I was already doing this, I figured I'd see whether the test results would be affected by using this new RNG to set the CRM perturbations in setperturb(). I also changed the condition that decides whether to call setperturb() and based it on the logical "is_first_step" instead of the old method that compared a few arbitrary CRM cells to see if they were equal. These changes resulted in a passing test of COMPARE_base_rest, however I'm lacking a complete explanation of why this worked. In theory, neither change chould have affected this comparison as long as the intrinsic RNG got the same seed and gave the same results, even if those values were near zero. The difference may lie in how the intrinsic RNG was called such that the string of random numbers was "remembered" from the previous seed, in which case changing the processor count may result in a different sequence. Either way, the new RNG is much more robust and reliable. The new RNG uses the Mersenne Twister algorithm. There is already a Mersenne Twister RNG implemented elsewhere in the model, but it is cumbersome to get single values, so I think the current approach is preferable for the CRM.