NCAR / NWS_hydro_models

Hydrology models adapted from NWS conceptual modeling approaches and code.
GNU General Public License v3.0
7 stars 17 forks source link

Differences between model versions compiled with `ifort`, `gfortran`, and `pgi` #12

Open lizaclark opened 7 years ago

lizaclark commented 7 years ago

This issue addresses differences between compilations (without explicit double precision flags) of the SHARP branch. The case with double precision flags will be addressed in a separate issue. I used three compilers on hydro-c1 at UCAR with the following flags: 1) ifort (IFORT) 15.0.2 20150121 • sac_77: ifort -O3 -f77rtlsnow19: ifort –O3UTIL: ifort -O3 -warn all -check all 2) GNU Fortran (Debian 4.9.2-10) 4.9.2 • sac_77: gfortran -O3 -fno-align-commons -ffree-line-length-none flags77 • snow19: gfortran -O3 -fno-align-commons -ffree-line-length-none flags2 • UTIL: gfortran -O3 -fno-align-commons -ffree-line-length-none 3) PGI 15.7-0 64-bit target on x86-64 Linux -tp sandybridge • sac_77: pgf77 -O3snow19: pgf90 –O3UTIL: pgf90 -O3 -Kieee

Note that the Makefile in this commit does not use any flags for PGI because the FC and FC77 variables do not match pgf90 or pgf77 as expected in the if statements that turn on these flags.

These compilers produced different results, particularly in the wintertime. swe_orig_diff Fig. 1. SWE timeseries: The lower set of lines shows swe in mm, plotted on the left axis. The upper set of lines shows the difference in swe in mm between the simulation using PGI and the other two compilers.

@anewman89 suggested checking a change he had made to include a tiny offset in aesc19.f for when the precision of an ascii restart file caused problems in the comparison. tiny is an intrinsic function with tiny(x) - Returns the smallest positive number that can be represented on the current computer for real argument x, but in the current commit, tiny does not call an argument x; it is treated as an uninitialized variable.

When I tried to print out the values of tiny to the screen, the PGI compiled code produced different results than when PGI compiled the same code without a print statement. This suggests a memory error.. I ran the program with valgrind, which identified a leak at the line where tiny is used:

==40656== Conditional jump or move depends on uninitialised value(s)
==40656==    at 0x40421E: aesc19_ (aesc19.f:25)
==40656==    by 0x402857: pack19_ (PACK19.f:218)
==40656==    by 0x40C252: exsnow19_ (exsnow19.f:142)
==40656==    by 0x40F2ED: MAIN_ (multi_driver.f90:282)
==40656==    by 0x401CC3: main (in ~/NWS_hydro_models/Snow17SacUH/src_bin/Snow17SacUH.exe)

The values printed for tiny were inconsistent. Here is a subset of the values printed to the screen in valgrind:

 tiny=  -1.7005811E+38
 tiny=    0.000000    
 tiny=  -1.7005811E+38
 tiny=   8.5123166E-02
 tiny=  -1.7005811E+38
 tiny=   8.3787210E-02
 tiny=  -1.7005811E+38
 tiny=   0.1242791    
 tiny=  -1.7005811E+38
 tiny=    0.000000    
 tiny=  -1.7005811E+38
 tiny=   0.1852178    
 tiny=  -1.7005811E+38
 tiny=    0.000000 

Three equivalent independent solutions that successful matched the PGI compiled output to that of the other compilers (bringing the black line down to match the red and blue lines in Fig. 1, and reproducing simulated streamflow to within +/-0.01 cfs) were: 1) Use the -Msave flag on f90 and f77: This initializes all undefined values at zero. 2) Set tiny = 0.0 before line 15 in aesc.f. 3) Remove tiny completely.

The consistency of these options with ifort and gfortran implies that ifort and gfortran initialize tiny to 0.0 when compiled.

@andywood noted that the precision of the ascii state files has been increased since @anewman89 ran into the issue that required tiny in the first place. I am doing frequent restarts in my application, so I tested that a version of the code without tiny still has exact restarts. I did this by running two simulations from 2005-10-01 to 2006-09-30: 1) a continuous run for the entire period 2) a run that stops and restarts from a state file every day.

I compared the output files from the two scenarios, and they are equivalent. This suggests that for better reproducibility, tiny can and should be removed. Doing so is conveniently also truer to the original code.

andywood commented 7 years ago

Hi Liz,

This is useful analysis -- based on this and our prior experience with the -r8/autodouble flags, let's go back to NOT using those flags (which we apparently did inadvertently after the PGF compiler variable picked up a path, breaking the if statement).

And let's remove tiny() and go back to using f77 to compile the snow routines, as was the original implementation. I'll go back to using gfortran for the calibrations on hydro-c1 unless run times are substantially slower than ifort.

While you're in this mode (recompiling/run/plot), can you check whether using f77 instead of f90 to compile snow17 (without tiny) makes much difference?

-Andy

On Fri, Feb 17, 2017 at 11:08 PM, Elizabeth Clark notifications@github.com wrote:

This issue addresses differences between compilations (without explicit double precision flags) of the SHARP branch https://github.com/NCAR/NWS_hydro_models/tree/28fb8a7f16a48f6b8694c7a4124d2ad5d31071a8. The case with double precision flags will be addressed in a separate issue. I used three compilers on hydro-c1 at UCAR with the following flags:

  1. ifort (IFORT) 15.0.2 20150121 • sac_77: ifort -O3 -f77rtl • snow19: ifort –O3 • UTIL: ifort -O3 -warn all -check all
  2. GNU Fortran (Debian 4.9.2-10) 4.9.2 • sac_77: gfortran -O3 -fno-align-commons -ffree-line-length-none flags77 • snow19: gfortran -O3 -fno-align-commons -ffree-line-length-none flags2 • UTIL: gfortran -O3 -fno-align-commons -ffree-line-length-none
  3. PGI 15.7-0 64-bit target on x86-64 Linux -tp sandybridge • sac_77: pgf77 -O3 • snow19: pgf90 –O3 • UTIL: pgf90 -O3 -Kieee

Note that the Makefile https://github.com/NCAR/NWS_hydro_models/blob/28fb8a7f16a48f6b8694c7a4124d2ad5d31071a8/Snow17SacUH/src_bin/driver/Makefile in this commit does not use any flags for PGI because the FC and FC77 variables do not match pgf90 or pgf77 as expected in the if statements that turn on these flags.

These compilers produced different results, particularly in the wintertime. [image: swe_orig_diff] https://cloud.githubusercontent.com/assets/5132040/23090619/a4360f78-f556-11e6-9d82-e7316d5b0dc0.png Fig. 1. SWE timeseries: The lower set of lines shows swe in mm, plotted on the left axis. The upper set of lines shows the difference in swe in mm between the simulation using PGI and the other two compilers.

@anewman89 https://github.com/anewman89 suggested checking a change he had made to include a tiny offset in aesc19.f https://github.com/NCAR/NWS_hydro_models/blob/SHARP/Snow17SacUH/src_bin/snow19/aesc19.f#L25 for when the precision of an ascii restart file caused problems in the comparison. tiny is an intrinsic function with tiny(x) - Returns the smallest positive number that can be represented on the current computer for real argument x http://www.personal.psu.edu/jhm/f90/lectures/7.html, but in the current commit, tiny does not call an argument x; it is treated as an uninitialized variable.

When I tried to print out the values of tiny to the screen, the PGI compiled code produced different results than when PGI compiled the same code without a print statement. This suggests a memory error. http://stackoverflow.com/questions/33127940/result-changes-when-a-variable-is-used-in-print-statement-before-the-computation. I ran the program with valgrind, which identified a leak at the line where tiny is used:

==40656== Conditional jump or move depends on uninitialised value(s) ==40656== at 0x40421E: aesc19 (aesc19.f:25) ==40656== by 0x402857: pack19 (PACK19.f:218) ==40656== by 0x40C252: exsnow19 (exsnow19.f:142) ==40656== by 0x40F2ED: MAIN (multi_driver.f90:282) ==40656== by 0x401CC3: main (in ~/NWS_hydro_models/Snow17SacUH/src_bin/Snow17SacUH.exe)

The values printed for tiny were inconsistent. Here is a subset of the values printed to the screen in valgrind:

tiny= -1.7005811E+38 tiny= 0.000000 tiny= -1.7005811E+38 tiny= 8.5123166E-02 tiny= -1.7005811E+38 tiny= 8.3787210E-02 tiny= -1.7005811E+38 tiny= 0.1242791 tiny= -1.7005811E+38 tiny= 0.000000 tiny= -1.7005811E+38 tiny= 0.1852178 tiny= -1.7005811E+38 tiny= 0.000000

Three equivalent independent solutions that successful matched the PGI compiled output to that of the other compilers (bringing the black line down to match the red and blue lines in Fig. 1, and reproducing simulated streamflow to within +/-0.01 cfs) were:

  1. Use the -Msave flag on f90 and f77: This initializes all undefined values at zero http://www.pgroup.com/userforum/viewtopic.php?t=408&sid=87dcadeff6a563829ad8e4182ab36655 .
  2. Set tiny = 0.0 before line 15 https://github.com/NCAR/NWS_hydro_models/blob/28fb8a7f16a48f6b8694c7a4124d2ad5d31071a8/Snow17SacUH/src_bin/snow19/aesc19.f#L15 in aesc.f.
  3. Remove tiny completely.

The consistency of these options with ifort and gfortran implies that ifort and gfortran initialize tiny to 0.0 when compiled.

@andywood https://github.com/andywood noted that the precision of the ascii state files has been increased since @anewman89 https://github.com/anewman89 ran into the issue that required tiny in the first place. I am doing frequent restarts in my application, so I tested that a version of the code without tiny still has exact restarts. I did this by running two simulations from 2005-10-01 to 2006-09-30:

  1. a continuous run for the entire period
  2. a run that stops and restarts from a state file every day.

I compared the output files from the two scenarios, and they are equivalent. This suggests that for better reproducibility, tiny can and should be removed. Doing so is conveniently also truer to the original code.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/NWS_hydro_models/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCgReQBhl2NB8fOPDeLVoHST8MWcvvYks5rdorigaJpZM4MFA2r .

anewman89 commented 7 years ago

I agree with Andy's comments, with this and issue #13, going back to the original code and no double precision flags is probably the best route to go.

lizaclark commented 7 years ago

@andywood: I attempted to compile and run the model using FC77EXE in place of FCEXE and F77FLAGS in place of FLAGS and FLAGS2 to compile with each of PGI, ifort, and gfortranfor the version without tiny. The runs use the same namelist and input files as the previous runs described in issues #12 and #13.

  1. The PGI compiler returns this error:

    /home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f:
    PGFTN-S-0083-Vector expression used where scalar expression required (/home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f: 130)
    PGFTN-S-0083-Vector expression used where scalar expression required (/home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f: 134)
    0 inform,   0 warnings,   2 severes, 0 fatal for exsnow19
    Makefile:181: recipe for target 'compile_calib' failed
    make: *** [compile_calib] Error 2
  2. ifort compiles but fails to run with:

    
    ***** ERROR: Problem reading namelist INIT_CONTROL

forrtl: severe (24): end-of-file during read, unit 30, file /home/eclark/NWS_hydro_models/Snow17SacUH/src_bin/driver/namelist.test, line 41, position 1 Image PC Routine Line Source
Snow17SacUH.exe 000000000041BFD4 Unknown Unknown Unknown Snow17SacUH.exe 000000000043D005 Unknown Unknown Unknown Snow17SacUH.exe 0000000000417C17 Unknown Unknown Unknown Snow17SacUH.exe 000000000040C57D Unknown Unknown Unknown Snow17SacUH.exe 000000000040263E Unknown Unknown Unknown libc.so.6 00002AFD1BD1EB45 Unknown Unknown Unknown Snow17SacUH.exe 0000000000402539 Unknown Unknown Unknown


3. `gfortran` compiles and runs to completion, but this is not surprising because the flags for `F77FLAGS` = `FLAGS` = `FLAGS2` in the `Makefile` for this compiler option.
andywood commented 7 years ago

It might be worth checking if there are any other 'AJN' modifications to that code that trigger these errors, but also this looks like variable definition mismatch, probably not something difficult to fix. Also we can clean up the makefile to apply the proper flags. Thanks for checking on this. -Andy

On Sat, Feb 18, 2017 at 2:50 PM, Elizabeth Clark notifications@github.com wrote:

@andywood https://github.com/andywood: I attempted to compile and run the model using FC77EXE in place of FCEXE and F77FLAGS in place of FLAGS and FLAGS2 to compile with each of PGI, ifort, and gfortranfor the version without tiny. The runs use the same namelist and input files as the previous runs described in issues #12 https://github.com/NCAR/NWS_hydro_models/issues/12 and #13 https://github.com/NCAR/NWS_hydro_models/issues/13.

  1. The PGI compiler returns this error:

/home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f: PGFTN-S-0083-Vector expression used where scalar expression required (/home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f: 130) PGFTN-S-0083-Vector expression used where scalar expression required (/home/eclark/NWS_hydro_models/Snow17SacUH/src_bin//snow19/exsnow19.f: 134) 0 inform, 0 warnings, 2 severes, 0 fatal for exsnow19 Makefile:181: recipe for target 'compile_calib' failed make: *** [compile_calib] Error 2

  1. ifort compiles but fails to run with:

    ***** ERROR: Problem reading namelist INIT_CONTROL

forrtl: severe (24): end-of-file during read, unit 30, file /home/eclark/NWS_hydro_models/Snow17SacUH/src_bin/driver/namelist.test, line 41, position 1 Image PC Routine Line Source Snow17SacUH.exe 000000000041BFD4 Unknown Unknown Unknown Snow17SacUH.exe 000000000043D005 Unknown Unknown Unknown Snow17SacUH.exe 0000000000417C17 Unknown Unknown Unknown Snow17SacUH.exe 000000000040C57D Unknown Unknown Unknown Snow17SacUH.exe 000000000040263E Unknown Unknown Unknown libc.so.6 00002AFD1BD1EB45 Unknown Unknown Unknown Snow17SacUH.exe 0000000000402539 Unknown Unknown Unknown

  1. gfortran compiles and runs to completion, but this is not surprising because the flags for F77FLAGS = FLAGS = FLAGS2 in the Makefile for this compiler option.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/NWS_hydro_models/issues/12#issuecomment-280877683, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCgRZNpkE0h6xcKtcQzL8bH74sdQg-Lks5rd2ecgaJpZM4MFA2r .

anewman89 commented 7 years ago

Looks like the compile error is in the original code. For example, it allocates SMFV as an array on line 18, then on line 130 it initializes it to zero: SMFV=0 This is allowed in F90+, must not be in F77. Interesting that ifort allows it, but that may be related to why it generates a runtime error?

andywood commented 7 years ago

Perhaps there's some default behavior like initializing the very first array member. I'll check the code I got from OHD in 2012 -- perhaps it's an update on this version, which likely came from an older source (ie something Yun took out a ways back). -Andy

On Sun, Feb 19, 2017 at 7:10 AM, Andy Newman notifications@github.com wrote:

Looks like the compile error is in the original code. For example, it allocates SMFV as an array on line 18, then on line 130 it initializes it to zero: SMFV=0 This is allowed in F90+, must not be in F77. Interesting that ifort allows it, but that may be related to why it generates a runtime error?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/NWS_hydro_models/issues/12#issuecomment-280921477, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCgRSKE8QjCrBZ93U0YIFaHhZHxMLurks5reE1JgaJpZM4MFA2r .