JCSDA / CRTMv3

CRTMv3 repository for coordinated development and releases. Code history is not carried in this repository prior to v3, to reduce the cloning overhead. For v2.x history leading up to v3, see JCSDA/crtm repository.
Other
6 stars 5 forks source link

numerical differences in gfortran vs. ifort and release vs. debug #154

Open BenjaminTJohnson opened 1 month ago

BenjaminTJohnson commented 1 month ago

This issue captures a longstanding (and generally ignored) issue with CRTM wherein some ctest results will differ when run in release vs. debug. I don't know that there's a clear solution, but given that it only affects certain ctests, suggests that there might be a fix.

The largest difference on the order of 1e-11, so in no way would this impact anything useful.

ifort, Release:

-- Project version : 3.1.0
-- Fortran compiler : /opt/intel/oneapi/2022.1/compiler/2022.0.1/linux/bin/intel64/ifort
-- Fortran compiler flags :  -assume byterecl -fPIC
-- Build type : Release
-- Fortran compiler flags for release : -O3 -ip -unroll -inline -no-heap-arrays

ifort, Debug:

-- Project version : 3.1.0
-- Fortran compiler : /opt/intel/oneapi/2022.1/compiler/2022.0.1/linux/bin/intel64/ifort
-- Fortran compiler flags :  -assume byterecl -fPIC
-- Build type : DEBUG
-- Fortran compiler flags for debug : -O0 -g -check bounds -traceback -warn -heap-arrays -fpe-all=0 -fpe:0 -ftz -check all

gfortran, Release:

-- Project version : 3.1.0
-- Fortran compiler : /home/bjohnson/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/gcc-14.1.0-lg64yhqjdx56qc37ds2rnvguco7tkyug/bin/gfortran
-- Fortran compiler flags : -I/home/bjohnson/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/netcdf-fortran-4.6.1-l5onh6o5qivl4qkq7thsiwyn3pge3k62/include -D_REAL8_ -ffree-line-length-none
-- Build type : RELEASE
-- Fortran compiler flags for release : -O3 -funroll-all-loops -fopenmp -finline-functions 

gfortran, Debug:

-- Project version : 3.1.0
-- Fortran compiler : /home/bjohnson/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/gcc-14.1.0-lg64yhqjdx56qc37ds2rnvguco7tkyug/bin/gfortran
-- Fortran compiler flags : -I/home/bjohnson/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/netcdf-fortran-4.6.1-l5onh6o5qivl4qkq7thsiwyn3pge3k62/include -D_REAL8_ -ffree-line-length-none
-- Build type : DEBUG
-- Fortran compiler flags for debug : -O0 -g -fcheck=bounds -ffpe-trap=invalid,zero,overflow -fbacktrace
BenjaminTJohnson commented 1 month ago

gfortran debug vs ifort release (reference)

     13 - test_forward_Simple_atms_n21 (NUMERICAL)
     14 - test_forward_Simple_cris-fsr_n21 (NUMERICAL)
     15 - test_forward_Simple_v.abi_g18 (NUMERICAL)
     16 - test_forward_Simple_atms_npp (NUMERICAL)
     17 - test_forward_Simple_cris399_npp (NUMERICAL)
     18 - test_forward_Simple_v.abi_gr (NUMERICAL)
     19 - test_forward_Simple_abi_g18 (NUMERICAL)
     20 - test_forward_Simple_modis_aqua (NUMERICAL)
     34 - test_forward_ClearSky_cris-fsr_n21 (Failed)
     41 - test_forward_Aircraft_cris-fsr_n21 (Failed)
     44 - test_forward_ScatteringSwitch_cris-fsr_n21 (Failed)
     53 - test_forward_SOI_v.abi_g18 (Failed)
     56 - test_forward_SOI_v.abi_gr (Failed)
    130 - test_adjoint_Simple_modis_aqua (Failed)
    140 - test_tangent_linear_Simple_cris-fsr_n21 (Failed)
    141 - test_tangent_linear_Simple_v.abi_g18 (Failed)
    143 - test_tangent_linear_Simple_cris399_npp (Failed)
    144 - test_tangent_linear_Simple_v.abi_gr (Failed)
    145 - test_tangent_linear_Simple_abi_g18 (Failed)
    146 - test_tangent_linear_Simple_modis_aqua (Failed)
    149 - test_tangent_linear_ClearSky_v.abi_g18 (Failed)
    152 - test_tangent_linear_ClearSky_v.abi_gr (Failed)
1/22 Test  #13: test_forward_Simple_atms_n21 .................***Exception: Numerical  0.14 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f3f1d4253ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
 2/22 Test  #14: test_forward_Simple_cris-fsr_n21 .............***Exception: Numerical  5.41 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7ffbb32d03ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
3/22 Test  #15: test_forward_Simple_v.abi_g18 ................***Exception: Numerical  0.16 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fd60cd223ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
4/22 Test  #16: test_forward_Simple_atms_npp .................***Exception: Numerical  0.15 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f7439a223ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
5/22 Test  #17: test_forward_Simple_cris399_npp ..............***Exception: Numerical  1.10 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fa2828683ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
 6/22 Test  #18: test_forward_Simple_v.abi_gr .................***Exception: Numerical  0.16 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f9e21c933ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
 7/22 Test  #19: test_forward_Simple_abi_g18 ..................***Exception: Numerical  0.19 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fae6852c3ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14
 8/22 Test  #20: test_forward_Simple_modis_aqua ...............***Exception: Numerical  0.20 sec
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f54a91c23ff in ???
#1  0x70f9ca in __compare_float_numbers_MOD_cwt_real_double
  at /data/users/bjohnson/CRTM/CRTMv3/src/Utility/Compare_Float_Numbers.f90:697
#2  0x5db77a in __crtm_rtsolution_define_MOD_crtm_rtsolution_compare
  at /data/users/bjohnson/CRTM/CRTMv3/src/RTSolution/CRTM_RTSolution_Define.f90:672
#3  0x40c037 in test_simple
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:280
#4  0x413a31 in main
  at /data/users/bjohnson/CRTM/CRTMv3/test/mains/regression/forward/test_Simple/test_Simple.f90:14

End of exception errors

9/22 Test  #34: test_forward_ClearSky_cris-fsr_n21 ...........***Failed    0.96 sec
> diff -y test_forward_ClearSky_cris-fsr_n21_gfortran_debug.txt test_forward_ClearSky_cris-fsr_n21_gfortran_release.txt | grep "|"
1/1 Test #34: test_forward_ClearSky_cris-fsr_n21 ...***Failed    2.07 sec             | 1/1 Test #34: test_forward_ClearSky_cris-fsr_n21 ...***Failed    1.76 sec
CRTM_Tests    =   2.07 sec*proc (1 test)                              | CRTM_Tests    =   1.76 sec*proc (1 test)
Total Test time (real) =   2.12 sec                               | Total Test time (real) =   1.85 sec
----

So no difference between debug and release using gfortran.

Here's a "summary" of the differences observed for this specific test:

668K -rw-r--r--  1 bjohnson domain users 3.5M Jul 26 19:01 diff_gd_ir.txt
 512 -rw-r--r--  1 bjohnson domain users 269K Jul 26 19:01 diff_gd_id.txt
 512 -rw-r--r--  1 bjohnson domain users 3.5M Jul 26 19:01 diff_id_ir.txt
 512 -rw-r--r--  1 bjohnson domain users 269K Jul 26 19:02 diff_id_gr.txt
 512 -rw-r--r--  1 bjohnson domain users 3.5M Jul 26 19:02 diff_ir_gr.txt
 512 -rw-r--r--  1 bjohnson domain users  338 Jul 26 19:03 diff_gd_gr.txt

where gd = gfortran_debug, andir = ifort_release`, etc.

The most differences occur when anything is compared to ifort release. Fewer differences occur when comparing gfortran to ifort debug. The only one with almost no difference is between gfortran debug and gfortran release.

Here's an example of the differences between gfortran release and ifortran release:

<...>
Radiance: num1 = 1.74107149954568E+00, num2 = 1.74107149954568E+00, percent_difference = 1.14779975713045E-13%
Brightness Temperature: num1 = 3.15166486936910E+02, num2 = 3.15166486936910E+02, percent_difference = 1.80359972322143E-14%
Stokes: num1 = 1.74107149954568E+00, num2 = 1.74107149954568E+00, percent_difference = 1.14779975713045E-13%
Up Radiance: num1 = 1.67948500636161E-01, num2 = 1.67948500636161E-01, percent_difference = 4.95787259377050E-14%
Down Radiance: num1 = 1.95273381650301E-01, num2 = 1.95273381650301E-01, percent_difference = 4.26411045597613E-14%
Down Solar Radiance: num1 = 3.68903722986171E+00, num2 = 3.68903722986171E+00, percent_difference = 2.40761576627791E-14%
Radiance: num1 = 1.73341571892138E+00, num2 = 1.73341571892138E+00, percent_difference = 5.12386272955224E-14%
Brightness Temperature: num1 = 3.15104527619051E+02, num2 = 3.15104527619050E+02, percent_difference = 3.60790873367136E-14%
Stokes: num1 = 1.73341571892138E+00, num2 = 1.73341571892138E+00, percent_difference = 5.12386272955224E-14%

The values that produced the largest percent difference:
Down Solar Radiance: num1 = 3.27014581763374E-71, num2 = 3.27014568271136E-71, percent_difference = 4.12588283764277E-06%

And example of differences between gfortran debug vs. ifort debug

<...>
Radiance: num1 = 1.93928626122432E+00, num2 = 1.93928626122432E+00, percent_difference = 4.57992426110106E-14%
Stokes: num1 = 1.93928626122432E+00, num2 = 1.93928626122432E+00, percent_difference = 4.57992426110106E-14%
Up Radiance: num1 = 1.34696913862698E-01, num2 = 1.34696913862698E-01, percent_difference = 8.24237907749579E-14%
Down Radiance: num1 = 1.49189426105494E-01, num2 = 1.49189426105494E-01, percent_difference = 5.58127536384566E-14%
Radiance: num1 = 1.90834969337093E+00, num2 = 1.90834969337093E+00, percent_difference = 5.81771269952125E-14%
Stokes: num1 = 1.90834969337093E+00, num2 = 1.90834969337093E+00, percent_difference = 5.81771269952125E-14%
Up Radiance: num1 = 1.78588101289685E-01, num2 = 1.78588101289685E-01, percent_difference = 6.21666850483101E-14%
Up Radiance: num1 = 1.93723696734729E-01, num2 = 1.93723696734729E-01, percent_difference = 5.73096138127808E-14%
Up Radiance: num1 = 1.95642859357333E-01, num2 = 1.95642859357333E-01, percent_difference = 5.67474339861994E-14%
Down Radiance: num1 = 2.29191736945382E-01, num2 = 2.29191736945382E-01, percent_difference = 4.84407963141242E-14%
Up Radiance: num1 = 1.67019298349406E-01, num2 = 1.67019298349406E-01, percent_difference = 6.64727391144080E-14%

The values that produced the largest percent difference:
Down Solar Radiance: num1 = 1.98443571457145E-11, num2 = 1.98443571457185E-11, percent_difference = 1.98973185487457E-11%

Overall these values are tiny, but I wanted to document these. The numerical issue " Floating-point exception - erroneous arithmetic operation." appears to be a "bug" in the float comparison routine, and likely related to underflow.