Open lizaclark opened 7 years ago
Hi Liz,
This is useful analysis -- based on this and our prior experience with the -r8/autodouble flags, let's go back to NOT using those flags (which we apparently did inadvertently after the PGF compiler variable picked up a path, breaking the if statement).
And let's remove tiny() and go back to using f77 to compile the snow routines, as was the original implementation. I'll go back to using gfortran for the calibrations on hydro-c1 unless run times are substantially slower than ifort.
While you're in this mode (recompiling/run/plot), can you check whether using f77 instead of f90 to compile snow17 (without tiny) makes much difference?
-Andy
On Fri, Feb 17, 2017 at 11:08 PM, Elizabeth Clark notifications@github.com wrote:
This issue addresses differences between compilations (without explicit double precision flags) of the SHARP branch https://github.com/NCAR/NWS_hydro_models/tree/28fb8a7f16a48f6b8694c7a4124d2ad5d31071a8. The case with double precision flags will be addressed in a separate issue. I used three compilers on hydro-c1 at UCAR with the following flags:
- ifort (IFORT) 15.0.2 20150121 • sac_77: ifort -O3 -f77rtl • snow19: ifort –O3 • UTIL: ifort -O3 -warn all -check all
- GNU Fortran (Debian 4.9.2-10) 4.9.2 • sac_77: gfortran -O3 -fno-align-commons -ffree-line-length-none flags77 • snow19: gfortran -O3 -fno-align-commons -ffree-line-length-none flags2 • UTIL: gfortran -O3 -fno-align-commons -ffree-line-length-none
- PGI 15.7-0 64-bit target on x86-64 Linux -tp sandybridge • sac_77: pgf77 -O3 • snow19: pgf90 –O3 • UTIL: pgf90 -O3 -Kieee
Note that the Makefile https://github.com/NCAR/NWS_hydro_models/blob/28fb8a7f16a48f6b8694c7a4124d2ad5d31071a8/Snow17SacUH/src_bin/driver/Makefile in this commit does not use any flags for PGI because the FC and FC77 variables do not match pgf90 or pgf77 as expected in the if statements that turn on these flags.
These compilers produced different results, particularly in the wintertime. [image: swe_orig_diff] https://cloud.githubusercontent.com/assets/5132040/23090619/a4360f78-f556-11e6-9d82-e7316d5b0dc0.png Fig. 1. SWE timeseries: The lower set of lines shows swe in mm, plotted on the left axis. The upper set of lines shows the difference in swe in mm between the simulation using PGI and the other two compilers.
@anewman89 https://github.com/anewman89 suggested checking a change he had made to include a tiny offset in aesc19.f https://github.com/NCAR/NWS_hydro_models/blob/SHARP/Snow17SacUH/src_bin/snow19/aesc19.f#L25 for when the precision of an ascii restart file caused problems in the comparison. tiny is an intrinsic function with tiny(x) - Returns the smallest positive number that can be represented on the current computer for real argument x http://www.personal.psu.edu/jhm/f90/lectures/7.html, but in the current commit, tiny does not call an argument x; it is treated as an uninitialized variable.
When I tried to print out the values of tiny to the screen, the PGI compiled code produced different results than when PGI compiled the same code without a print statement. This suggests a memory error. http://stackoverflow.com/questions/33127940/result-changes-when-a-variable-is-used-in-print-statement-before-the-computation. I ran the program with valgrind, which identified a leak at the line where tiny is used:
==40656== Conditional jump or move depends on uninitialised value(s) ==40656== at 0x40421E: aesc19 (aesc19.f:25) ==40656== by 0x402857: pack19 (PACK19.f:218) ==40656== by 0x40C252: exsnow19 (exsnow19.f:142) ==40656== by 0x40F2ED: MAIN (multi_driver.f90:282) ==40656== by 0x401CC3: main (in ~/NWS_hydro_models/Snow17SacUH/src_bin/Snow17SacUH.exe)
The values printed for tiny were inconsistent. Here is a subset of the values printed to the screen in valgrind:
tiny= -1.7005811E+38 tiny= 0.000000 tiny= -1.7005811E+38 tiny= 8.5123166E-02 tiny= -1.7005811E+38 tiny= 8.3787210E-02 tiny= -1.7005811E+38 tiny= 0.1242791 tiny= -1.7005811E+38 tiny= 0.000000 tiny= -1.7005811E+38 tiny= 0.1852178 tiny= -1.7005811E+38 tiny= 0.000000
Three equivalent independent solutions that successful matched the PGI compiled output to that of the other compilers (bringing the black line down to match the red and blue lines in Fig. 1, and reproducing simulated streamflow to within +/-0.01 cfs) were:
- Use the -Msave flag on f90 and f77: This initializes all undefined values at zero http://www.pgroup.com/userforum/viewtopic.php?t=408&sid=87dcadeff6a563829ad8e4182ab36655 .
- Set tiny = 0.0 before line 15 https://github.com/NCAR/NWS_hydro_models/blob/28fb8a7f16a48f6b8694c7a4124d2ad5d31071a8/Snow17SacUH/src_bin/snow19/aesc19.f#L15 in aesc.f.
- Remove tiny completely.
The consistency of these options with ifort and gfortran implies that ifort and gfortran initialize tiny to 0.0 when compiled.
@andywood https://github.com/andywood noted that the precision of the ascii state files has been increased since @anewman89 https://github.com/anewman89 ran into the issue that required tiny in the first place. I am doing frequent restarts in my application, so I tested that a version of the code without tiny still has exact restarts. I did this by running two simulations from 2005-10-01 to 2006-09-30:
- a continuous run for the entire period
- a run that stops and restarts from a state file every day.
I compared the output files from the two scenarios, and they are equivalent. This suggests that for better reproducibility, tiny can and should be removed. Doing so is conveniently also truer to the original code.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/NWS_hydro_models/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCgReQBhl2NB8fOPDeLVoHST8MWcvvYks5rdorigaJpZM4MFA2r .
I agree with Andy's comments, with this and issue #13, going back to the original code and no double precision flags is probably the best route to go.
@andywood:
I attempted to compile and run the model using FC77EXE
in place of FCEXE
and F77FLAGS
in place of FLAGS
and FLAGS2
to compile with each of PGI, ifort
, and gfortran
for the version without tiny
. The runs use the same namelist and input files as the previous runs described in issues #12 and #13.
The PGI compiler returns this error:
/home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f:
PGFTN-S-0083-Vector expression used where scalar expression required (/home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f: 130)
PGFTN-S-0083-Vector expression used where scalar expression required (/home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f: 134)
0 inform, 0 warnings, 2 severes, 0 fatal for exsnow19
Makefile:181: recipe for target 'compile_calib' failed
make: *** [compile_calib] Error 2
ifort
compiles but fails to run with:
***** ERROR: Problem reading namelist INIT_CONTROL
forrtl: severe (24): end-of-file during read, unit 30, file /home/eclark/NWS_hydro_models/Snow17SacUH/src_bin/driver/namelist.test, line 41, position 1
Image PC Routine Line Source
Snow17SacUH.exe 000000000041BFD4 Unknown Unknown Unknown
Snow17SacUH.exe 000000000043D005 Unknown Unknown Unknown
Snow17SacUH.exe 0000000000417C17 Unknown Unknown Unknown
Snow17SacUH.exe 000000000040C57D Unknown Unknown Unknown
Snow17SacUH.exe 000000000040263E Unknown Unknown Unknown
libc.so.6 00002AFD1BD1EB45 Unknown Unknown Unknown
Snow17SacUH.exe 0000000000402539 Unknown Unknown Unknown
3. `gfortran` compiles and runs to completion, but this is not surprising because the flags for `F77FLAGS` = `FLAGS` = `FLAGS2` in the `Makefile` for this compiler option.
It might be worth checking if there are any other 'AJN' modifications to that code that trigger these errors, but also this looks like variable definition mismatch, probably not something difficult to fix. Also we can clean up the makefile to apply the proper flags. Thanks for checking on this. -Andy
On Sat, Feb 18, 2017 at 2:50 PM, Elizabeth Clark notifications@github.com wrote:
@andywood https://github.com/andywood: I attempted to compile and run the model using FC77EXE in place of FCEXE and F77FLAGS in place of FLAGS and FLAGS2 to compile with each of PGI, ifort, and gfortranfor the version without tiny. The runs use the same namelist and input files as the previous runs described in issues #12 https://github.com/NCAR/NWS_hydro_models/issues/12 and #13 https://github.com/NCAR/NWS_hydro_models/issues/13.
- The PGI compiler returns this error:
/home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f: PGFTN-S-0083-Vector expression used where scalar expression required (/home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f: 130) PGFTN-S-0083-Vector expression used where scalar expression required (/home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f: 134) 0 inform, 0 warnings, 2 severes, 0 fatal for exsnow19 Makefile:181: recipe for target 'compile_calib' failed make: *** [compile_calib] Error 2
ifort compiles but fails to run with:
***** ERROR: Problem reading namelist INIT_CONTROL
forrtl: severe (24): end-of-file during read, unit 30, file /home/eclark/NWS_hydro_models/Snow17SacUH/src_bin/driver/namelist.test, line 41, position 1 Image PC Routine Line Source Snow17SacUH.exe 000000000041BFD4 Unknown Unknown Unknown Snow17SacUH.exe 000000000043D005 Unknown Unknown Unknown Snow17SacUH.exe 0000000000417C17 Unknown Unknown Unknown Snow17SacUH.exe 000000000040C57D Unknown Unknown Unknown Snow17SacUH.exe 000000000040263E Unknown Unknown Unknown libc.so.6 00002AFD1BD1EB45 Unknown Unknown Unknown Snow17SacUH.exe 0000000000402539 Unknown Unknown Unknown
- gfortran compiles and runs to completion, but this is not surprising because the flags for F77FLAGS = FLAGS = FLAGS2 in the Makefile for this compiler option.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/NWS_hydro_models/issues/12#issuecomment-280877683, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCgRZNpkE0h6xcKtcQzL8bH74sdQg-Lks5rd2ecgaJpZM4MFA2r .
Looks like the compile error is in the original code. For example, it allocates SMFV as an array on line 18, then on line 130 it initializes it to zero: SMFV=0
This is allowed in F90+, must not be in F77. Interesting that ifort allows it, but that may be related to why it generates a runtime error?
Perhaps there's some default behavior like initializing the very first array member. I'll check the code I got from OHD in 2012 -- perhaps it's an update on this version, which likely came from an older source (ie something Yun took out a ways back). -Andy
On Sun, Feb 19, 2017 at 7:10 AM, Andy Newman notifications@github.com wrote:
Looks like the compile error is in the original code. For example, it allocates SMFV as an array on line 18, then on line 130 it initializes it to zero: SMFV=0 This is allowed in F90+, must not be in F77. Interesting that ifort allows it, but that may be related to why it generates a runtime error?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/NWS_hydro_models/issues/12#issuecomment-280921477, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCgRSKE8QjCrBZ93U0YIFaHhZHxMLurks5reE1JgaJpZM4MFA2r .
This issue addresses differences between compilations (without explicit double precision flags) of the
SHARP
branch. The case with double precision flags will be addressed in a separate issue. I used three compilers on hydro-c1 at UCAR with the following flags: 1) ifort (IFORT) 15.0.2 20150121 •sac_77
:ifort -O3 -f77rtl
•snow19
:ifort –O3
•UTIL
:ifort -O3 -warn all -check all
2) GNU Fortran (Debian 4.9.2-10) 4.9.2 •sac_77
:gfortran -O3 -fno-align-commons -ffree-line-length-none
flags77 •snow19
:gfortran -O3 -fno-align-commons -ffree-line-length-none
flags2 •UTIL
:gfortran -O3 -fno-align-commons -ffree-line-length-none
3) PGI 15.7-0 64-bit target on x86-64 Linux -tp sandybridge •sac_77
:pgf77 -O3
•snow19
:pgf90 –O3
•UTIL
:pgf90 -O3 -Kieee
Note that the
Makefile
in this commit does not use any flags for PGI because theFC
andFC77
variables do not matchpgf90
orpgf77
as expected in theif
statements that turn on these flags.These compilers produced different results, particularly in the wintertime. Fig. 1. SWE timeseries: The lower set of lines shows swe in mm, plotted on the left axis. The upper set of lines shows the difference in swe in mm between the simulation using
PGI
and the other two compilers.@anewman89 suggested checking a change he had made to include a
tiny
offset inaesc19.f
for when the precision of an ascii restart file caused problems in the comparison.tiny
is an intrinsic function with tiny(x) - Returns the smallest positive number that can be represented on the current computer for real argument x, but in the current commit,tiny
does not call an argumentx
; it is treated as an uninitialized variable.When I tried to print out the values of
tiny
to the screen, thePGI
compiled code produced different results than whenPGI
compiled the same code without a print statement. This suggests a memory error.. I ran the program withvalgrind
, which identified a leak at the line wheretiny
is used:The values printed for tiny were inconsistent. Here is a subset of the values printed to the screen in
valgrind
:Three equivalent independent solutions that successful matched the
PGI
compiled output to that of the other compilers (bringing the black line down to match the red and blue lines in Fig. 1, and reproducing simulated streamflow to within +/-0.01 cfs) were: 1) Use the-Msave
flag onf90
andf77
: This initializes all undefined values at zero. 2) Settiny = 0.0
before line 15 inaesc.f
. 3) Removetiny
completely.The consistency of these options with
ifort
andgfortran
implies thatifort
andgfortran
initializetiny
to 0.0 when compiled.@andywood noted that the precision of the ascii state files has been increased since @anewman89 ran into the issue that required
tiny
in the first place. I am doing frequent restarts in my application, so I tested that a version of the code withouttiny
still has exact restarts. I did this by running two simulations from 2005-10-01 to 2006-09-30: 1) a continuous run for the entire period 2) a run that stops and restarts from a state file every day.I compared the output files from the two scenarios, and they are equivalent. This suggests that for better reproducibility,
tiny
can and should be removed. Doing so is conveniently also truer to the original code.