Closed mjs2369 closed 9 months ago
@mjs2369 just chatting to Jeff about this. A better test to look at what is going on is to generate the sequence of random numbers from a given seed. So k is what we are interested in:
and this should be the same across compilers.
For the curious: gfortran, intel on my mac and derecho give this for k. 11 numbers seeded with 13:
k= 3340206418 k= 2608511152 k= 1020231754 k= 3691240976 k= 3540249318 k= 3835331426 k= 4147861236 k= 769458329 k= 4177289964 k= 3258093498 k= 1947549667
cce on derecho gives: k= -5939786187531372199 k= -7603175559541411156 k= 2499092022097743661 k= -6392185873013553955 k= 1418358412448069790 k= 1601992904522816967 k= 4918056359950545492 k= -7859870468495140367 k= -5366954201424499693 k= 4633693547982415675 k= -5357398119243707470
Chatting to Marlee, we think this might be a compiler bug:
hkershaw@derecho6:/glade/derecho/scratch/hkershaw/test_code$ module load intel
hkershaw@derecho6:/glade/derecho/scratch/hkershaw/test_code$ cat boz_dart.f90
program boz_dart
implicit none
integer, parameter :: i8 = SELECTED_INT_KIND(13)
! hexadecimal constants
integer(i8), parameter :: UPPER_MASK = int(z'0000000080000000', i8)
integer(i8), parameter :: LOWER_MASK = int(z'000000007FFFFFFF', i8)
integer(i8), parameter :: FULL32_MASK = int(z'00000000FFFFFFFF', i8)
integer(i8), parameter :: magic = int(z'000000009908B0DF', i8)
integer(i8), parameter :: C1 = int(z'000000009D2C5680', i8)
integer(i8), parameter :: C2 = int(z'00000000EFC60000', i8)
write(*, '(a, i20, 1x, z16)') "UPPER_MASK =", UPPER_MASK, UPPER_MASK
write(*, '(a, i20, 1x, z16)') "LOWER_MASK =", LOWER_MASK, LOWER_MASK
write(*, '(a, i20, 1x, z16)') "FULL32_MASK =", FULL32_MASK, FULL32_MASK
write(*, '(a, i20, 1x, z16)') "magic =", magic, magic
write(*, '(a, i20, 1x, z16)') "C1 =", C1, C1
write(*, '(a, i20, 1x, z16)') "C2 =", C2, C2
end program boz_dart
hkershaw@derecho6:/glade/derecho/scratch/hkershaw/test_code$ module load intel
hkershaw@derecho6:/glade/derecho/scratch/hkershaw/test_code$ ftn boz_dart.f90
hkershaw@derecho6:/glade/derecho/scratch/hkershaw/test_code$ ./a.out
UPPER_MASK = 2147483648 80000000
LOWER_MASK = 2147483647 7FFFFFFF
FULL32_MASK = 4294967295 FFFFFFFF
magic = 2567483615 9908B0DF
C1 = 2636928640 9D2C5680
C2 = 4022730752 EFC60000
hkershaw@derecho6:/glade/derecho/scratch/hkershaw/test_code$ module load cce
Lmod is automatically replacing "intel/2023.0.0" with "cce/15.0.1".
Due to MODULEPATH changes, the following have been reloaded:
1) cray-mpich/8.1.25 2) hdf5/1.12.2 3) ncarcompilers/1.0.0 4) netcdf/4.9.2
hkershaw@derecho6:/glade/derecho/scratch/hkershaw/test_code$ ftn boz_dart.f90
hkershaw@derecho6:/glade/derecho/scratch/hkershaw/test_code$ ./a.out
UPPER_MASK = -2147483648 FFFFFFFF80000000
LOWER_MASK = 2147483647 7FFFFFFF
FULL32_MASK = -1 FFFFFFFFFFFFFFFF
magic = -1727483681 FFFFFFFF9908B0DF
C1 = -1658038656 FFFFFFFF9D2C5680
C2 = -272236544 FFFFFFFFEFC60000
@mjs2369 Hi Marlee, did a bug report for this get sent to cray (by you or CISL help)?
@hkershaw-brown I don't believe so. I have a request on for CISL help under "Support wait" where they said they were going to reach out to their contact for any input/fixes, but I haven't heard back from them. I just added another comment to the request to see if there are any updates.
@hkershaw-brown
Update on this issue - CISL Support responded to my request after contacting HPE/Cray
This bug was patched in the lastest release of CCE. CISL IT is working to get this installed once HPE has fixed more bugs that others have reported as well. Once a 16.x.x version has been added to the stack on Derecho, I will revisit this pull request to test and hopefully close it.
In the mean time, we will need to keep using Intel to use the random number generator code and therefore perturb_from_single_instance on Derecho.
no new version of CCE on Derecho as of Jan 2023. Closing as this is a CCE bug rather than a DART bug.
@c-merchant A new 🎉 cce compiler version cce/16.0.1 is now available on Derecho. Can you give your ran_unif test a spin on Derecho with this new compiler.
For reference, here's your pull request with the random number test: https://github.com/NCAR/DART/pull/549 Let's see if cce/16 has the bug fixed.
This bug is fixed in cce/16.0.1 now available on Derecho.
:bug: Your bug may already be reported!
Describe the bug
The random number generator code in DART will
not compilewith cce on Derecho. Edit: the code compiles, but gives incorrect results More specifically, the subroutines init_ran, ran_unif, ran_gauss, and ran_gamma are all incompatible with cce. I believe this is because they all make use of code from the GNU Scientific Library:List the steps someone needs to take to reproduce the bug.
Run ./filter with any model with "perturb_from_single_instance = .true." in the namelist OR run ./test_gaussian or ./test_gamma in DART/developer_tests/random_seq/work.
What was the expected outcome? The executables run successfully.
What actually happened?
An run-time error halts the execution
Error Message
Please provide any error messages.
ERROR FROM: source : random_seq_mod.f90 routine: ran_gauss message: if both x and y are -1, random number generator probably not initialized message: ... x, y = -3510081565.7593699, -295496494.86667526
actual mean should be close to .50
Which model(s) are you working with?
All models, also the test_gaussian, test_random, and test_gamma developer tests in DART/developer_tests/random_seq/work.
Version of DART
Which version of DART are you using? You can find the version using
git describe --tags
v10.7.3
Have you modified the DART code?
No
Build information
Please describe:
Derecho, cce