E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
348 stars 355 forks source link

PEM ELM FATES tests failing on Anvil with maint-2.0 #5048

Open jayeshkrishna opened 2 years ago

jayeshkrishna commented 2 years ago

While testing different PE layouts on maint-2.0 (PR #5037) we found that the ELM+FATES tests are not BFB on Anvil with the Intel compiler.

The following tests are non-BFB on Anvil with maint-2.0 (v2.0.0-930-g6275091262) and the branch for PR #5037 (This PR just changed the PE layouts for the tests, the issues can be reproduced by using corresponding PEM tests),

    FAIL SMS.f09_g16_a.IGELM_MLI.anvil_intel BASELINE maint-2.0: DIFF
    FAIL SMS_Ld20.f45_f45.IELMFATES.anvil_intel.elm-fates_eca BASELINE maint-2.0: DIFF
    FAIL SMS_Ld20.f45_f45.IELMFATES.anvil_intel.elm-fates_rd BASELINE maint-2.0: DIFF

The PEM test (PEM_Ld20.f45_f45.IELMFATES.anvil_intel.elm-fates_eca) FAILs on Anvil with maint-2.0 branch (v2.0.0-930-g6275091262)

    PEM_Ld20.f45_f45.IELMFATES.anvil_intel.elm-fates_eca.20220624_134046_6t6cmv.cpl.hi.0001-01-21-00000.nc.base did NOT match PEM_Ld20.f45_f45.IELMFATES.anvil_intel.elm-fates_eca.20220624_134046_6t6cmv.cpl.hi.0001-01-21-00000.nc.modpes
    cat /lcrc/group/e3sm/jayesh/scratch/anvil/PEM_Ld20.f45_f45.IELMFATES.anvil_intel.elm-fates_eca.20220624_134046_6t6cmv/run/PEM_Ld20.f45_f45.IELMFATES.anvil_intel.elm-fates_eca.20220624_134046_6t6cmv.cpl.hi.0001-01-21-00000.nc.base.cprnc.out
FAIL
 ---------------------------------------------------
2022-06-24 14:06:41: compared suffixes suffix1 'base' suffix2 'modpes'

tail -n20 /lcrc/group/e3sm/jayesh/scratch/anvil/PEM_Ld20.f45_f45.IELMFATES.anvil_intel.elm-fates_eca.20220624_134046_6t6cmv/run/PEM_Ld20.f45_f45.IELMFATES.anvil_intel.elm-fates_eca.20220624_134046_6t6cmv.cpl.hi.0001-01-21-00000.nc.base.cprnc.out

                                   1.197024637758951E+02                        avg decimal digits(ndif): 13.6 worst:  9.9
 RMS x2r_Flrl_Tqsub                   6.5074E-10            NORMALIZED  5.4363E-12

 x2r_coszen_str   (x2r_nx,x2r_ny,time)  t_index =      1     1
              259200  (     1,     1,     1) (     1,     1,     1)
              259200   0.000000000000000E+00   0.000000000000000E+00
              259200   0.000000000000000E+00   0.000000000000000E+00
              259200  (     1,     1,     1) (     1,     1,     1)
          avg abs field values:    0.000000000000000E+00
                                   0.000000000000000E+00
************************************************************************************************************************************

SUMMARY of cprnc:
 A total number of    193 fields were compared
          of which     28 had non-zero differences
               and      0 had differences in fill patterns
 A total number of      0 fields could not be analyzed
 A total number of      0 fields on file 1 were not found on file2.
  diff_test: the two files seem to be DIFFERENT
 ---------------------------------------------------
2022-06-24 14:06:41: compared suffixes suffix1 'base' suffix2 'modpes'

tail -n20 /lcrc/group/e3sm/jayesh/scratch/anvil/PEM_Ld20.f45_f45.IELMFATES.anvil_intel.elm-fates_eca.20220624_134046_6t6cmv/run/PEM_Ld20.f45_f45.IELMFATES.anvil_intel.elm-fates_eca.20220624_134046_6t6cmv.elm.h0.0001-01-01-00000.nc.base.cprnc.out

                                   2.963745966553688E-02                        avg decimal digits(ndif):  3.3 worst:  0.8
 RMS TOTSOMP                          5.7771E-04            NORMALIZED  1.9468E-02

 WIND   (lon,lat,time)  t_index =     21    21
                3312  (    61,    38,     1) (    63,    22,     1)
                1466   0.196313457489014E+02   0.346849679946899E+00
                1466   0.196313457489014E+02   0.346849679946899E+00
                3312  (    61,    38,     1) (    63,    22,     1)
          avg abs field values:    4.043912887573242E+00
                                   4.043912887573242E+00
************************************************************************************************************************************

SUMMARY of cprnc:
 A total number of   1143 fields were compared
          of which    445 had non-zero differences
               and      0 had differences in fill patterns
 A total number of     42 fields could not be analyzed
 A total number of      0 fields on file 1 were not found on file2.
  diff_test: the two files seem to be DIFFERENT

 ---------------------------------------------------
rgknox commented 2 years ago

ok, thanks for bringing this up @jayeshkrishna

rljacob commented 2 years ago

@rgknox see if this test is failing on master. If it isn't, you don't necessarily have to fix it on maint-2.0 since watercycle cases didn't use fates. If it is, fix it on master first.