NOAA-GFDL / GFDL_atmos_cubed_sphere

The GFDL atmos_cubed_sphere dynamical core code
Other
56 stars 118 forks source link

Support IntelLLVM compiler #353

Closed DusanJovic-NOAA closed 1 month ago

DusanJovic-NOAA commented 3 months ago

Description

This PR adds support for Intel LLVM compiler.

Changes:

Fixes # (issue)

How Has This Been Tested?

ufs-weather-model regression test has been run on Hera. See https://github.com/ufs-community/ufs-weather-model/pull/2224

Checklist:

Please check all whether they apply or not

DusanJovic-NOAA commented 3 months ago

One of the ufs-weather-model debug regression tests is failing with floating-point invalid exception. Here is the backtrace:

 96: ==== backtrace (tid:1684736) ====
 96:  0 0x0000000000054d90 __GI___sigaction()  :0
 96:  1 0x0000000002f10983 nh_utils_mod_mp_riem_solver_c_.DIR.OMP.PARALLEL.LOOP.23721.split3724.split3747()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/atmos_cubed_sphere/model/nh_utils.F90:433
 96:  2 0x000000000015ff13 __kmp_invoke_microtask()  ???:0
 96:  3 0x00000000000cf664 _INTERNALdc23a976::__kmp_serial_fork_call()  /nfs/site/proj/openmp/promo/20230118/tmp/lin_32e-rtl_int_5_nor_dyn.rel.c0.s0.t1..h1.w1-fxilab153/../../src/kmp_runtime.cpp:1927
 96:  4 0x00000000000cf664 __kmp_fork_call()  /nfs/site/proj/openmp/promo/20230118/tmp/lin_32e-rtl_int_5_nor_dyn.rel.c0.s0.t1..h1.w1-fxilab153/../../src/kmp_runtime.cpp:2188
 96:  5 0x0000000000088d23 __kmpc_fork_call()  /nfs/site/proj/openmp/promo/20230118/tmp/lin_32e-rtl_int_5_nor_dyn.rel.c0.s0.t1..h1.w1-fxilab153/../../src/kmp_csupport.cpp:350
 96:  6 0x0000000002e499d8 nh_utils_mod_mp_riem_solver_c_()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/atmos_cubed_sphere/model/nh_utils.F90:404
 96:  7 0x00000000024645d3 dyn_core_mod_mp_dyn_core_()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/atmos_cubed_sphere/model/dyn_core.F90:628
 96:  8 0x000000000256c864 fv_dynamics_mod_mp_fv_dynamics_()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/atmos_cubed_sphere/model/fv_dynamics.F90:683
 96:  9 0x000000000235d824 atmosphere_mod_mp_adiabatic_init_()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/atmos_cubed_sphere/driver/fvGFS/atmosphere.F90:1889
 96: 10 0x0000000002309421 atmosphere_mod_mp_atmosphere_init_()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/atmos_cubed_sphere/driver/fvGFS/atmosphere.F90:574
 96: 11 0x0000000001e82c2b atmos_model_mod_mp_atmos_model_init_()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/atmos_model.F90:565
 96: 12 0x0000000001b726f6 module_fcst_grid_comp_mp_fcst_initialize_()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/module_fcst_grid_comp.F90:811
 96: 13 0x0000000000ad58c4 ESMCI::FTable::callVFuncPtr()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:2167
 96: 14 0x0000000000ad983f ESMCI_FTableCallEntryPointVMHop()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:824
 96: 15 0x0000000001149e8a ESMCI::VMK::enter()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/VM/src/ESMCI_VMKernel.C:1247
 96: 16 0x00000000012dea88 ESMCI::VM::enter()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/VM/src/ESMCI_VM.C:1216
 96: 17 0x0000000000ad6d1a c_esmc_ftablecallentrypointvm_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:981
 96: 18 0x00000000009e8e60 esmf_compmod_mp_esmf_compexecute_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMF_Comp.F90:1252
 96: 19 0x0000000000dc2db1 esmf_gridcompmod_mp_esmf_gridcompinitialize_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMF_GridComp.F90:1419
 96: 20 0x0000000001b470f5 fv3atm_cap_mod_mp_initializeadvertise_()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/fv3_cap.F90:444
 96: 21 0x000000000110af78 ESMCI::MethodElement::execute()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_MethodTable.C:377
 96: 22 0x000000000110aee7 ESMCI::MethodTable::execute()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_MethodTable.C:563
 96: 23 0x000000000110b574 c_esmc_methodtableexecuteef_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_MethodTable.C:347
 96: 24 0x0000000000ac12bb esmf_attachmethodsmod_mp_esmf_methodgridcompexecute_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/AttachMethods/src/ESMF_AttachMethods.F90:1280
 96: 25 0x0000000003a137e6 nuopc_modelbase_mp_initializeipdvxp01_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/addon/NUOPC/src/NUOPC_ModelBase.F90:687
 96: 26 0x0000000000ad58c4 ESMCI::FTable::callVFuncPtr()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:2167
 96: 27 0x0000000000ad983f ESMCI_FTableCallEntryPointVMHop()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:824
 96: 28 0x0000000001149c7a ESMCI::VMK::enter()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/VM/src/ESMCI_VMKernel.C:2501
 96: 29 0x00000000012dea88 ESMCI::VM::enter()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/VM/src/ESMCI_VM.C:1216
 96: 30 0x0000000000ad6d1a c_esmc_ftablecallentrypointvm_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:981
 96: 31 0x00000000009e8e60 esmf_compmod_mp_esmf_compexecute_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMF_Comp.F90:1252
 96: 32 0x0000000000dc2db1 esmf_gridcompmod_mp_esmf_gridcompinitialize_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMF_GridComp.F90:1419
 96: 33 0x000000000098b020 nuopc_driver_mp_loopmodelcompss_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/addon/NUOPC/src/NUOPC_Driver.F90:2889
 96: 34 0x00000000009b3948 nuopc_driver_mp_initializeipdv02p1_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/addon/NUOPC/src/NUOPC_Driver.F90:1346
 96: 35 0x00000000009bd13b nuopc_driver_mp_initializegeneric_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/addon/NUOPC/src/NUOPC_Driver.F90:484
 96: 36 0x0000000000ad58c4 ESMCI::FTable::callVFuncPtr()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:2167
 96: 37 0x0000000000ad983f ESMCI_FTableCallEntryPointVMHop()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:824
 96: 38 0x0000000001149c7a ESMCI::VMK::enter()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/VM/src/ESMCI_VMKernel.C:2501
 96: 39 0x00000000012dea88 ESMCI::VM::enter()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Infrastructure/VM/src/ESMCI_VM.C:1216
 96: 40 0x0000000000ad6d1a c_esmc_ftablecallentrypointvm_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:981
 96: 41 0x00000000009e8e60 esmf_compmod_mp_esmf_compexecute_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMF_Comp.F90:1252
 96: 42 0x0000000000dc2db1 esmf_gridcompmod_mp_esmf_gridcompinitialize_()  /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-rqrapepmgfb7kpri3ynqlxusquf6npfq/spack-src/src/Superstructure/Component/src/ESMF_GridComp.F90:1419
 96: 43 0x000000000042e373 MAIN__()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/driver/UFS.F90:389
 96: 44 0x000000000042c83d main()  ???:0
 96: 45 0x000000000003feb0 __libc_start_call_main()  ???:0
 96: 46 0x000000000003ff60 __libc_start_main_alias_2()  :0
 96: 47 0x000000000042c755 _start()  ???:0
 96: =================================
 96: forrtl: error (75): floating point exception
 96: Image              PC                Routine            Line        Source
 96: libc.so.6          00001547B04AED90  Unknown               Unknown  Unknown
 96: ufs_model          0000000002F10983  nh_utils_mod_mp_r         433  nh_utils.F90
 96: libiomp5.so        00001547B2BD7F13  __kmp_invoke_micr     Unknown  Unknown
 96: libiomp5.so        00001547B2B47664  __kmp_fork_call       Unknown  Unknown
 96: libiomp5.so        00001547B2B00D23  __kmpc_fork_call      Unknown  Unknown
 96: ufs_model          0000000002E499D8  riem_solver_c             404  nh_utils.F90
 96: ufs_model          00000000024645D3  dyn_core                  628  dyn_core.F90
 96: ufs_model          000000000256C864  fv_dynamics               683  fv_dynamics.F90
 96: ufs_model          000000000235D824  adiabatic_init           1889  atmosphere.F90
 96: ufs_model          0000000002309421  atmosphere_init           574  atmosphere.F90
 96: ufs_model          0000000001E82C2B  atmos_model_init          565  atmos_model.F90
 96: ufs_model          0000000001B726F6  fcst_initialize           811  module_fcst_grid_comp.F90
 96: ufs_model          0000000000AD58C4  Unknown               Unknown  Unknown
 96: ufs_model          0000000000AD983F  Unknown               Unknown  Unknown
 96: ufs_model          0000000001149E8A  Unknown               Unknown  Unknown
 96: ufs_model          00000000012DEA88  Unknown               Unknown  Unknown
 96: ufs_model          0000000000AD6D1A  Unknown               Unknown  Unknown
 96: ufs_model          00000000009E8E60  Unknown               Unknown  Unknown
 96: ufs_model          0000000000DC2DB1  Unknown               Unknown  Unknown
 96: ufs_model          0000000001B470F5  initializeadverti         444  fv3_cap.F90
 96: ufs_model          000000000110AF78  Unknown               Unknown  Unknown
 96: ufs_model          000000000110AEE7  Unknown               Unknown  Unknown
 96: ufs_model          000000000110B574  Unknown               Unknown  Unknown
 96: ufs_model          0000000000AC12BB  Unknown               Unknown  Unknown
 96: ufs_model          0000000003A137E6  Unknown               Unknown  Unknown
 96: ufs_model          0000000000AD58C4  Unknown               Unknown  Unknown
 96: ufs_model          0000000000AD983F  Unknown               Unknown  Unknown
 96: ufs_model          0000000001149C7A  Unknown               Unknown  Unknown
 96: ufs_model          00000000012DEA88  Unknown               Unknown  Unknown
 96: ufs_model          0000000000AD6D1A  Unknown               Unknown  Unknown
 96: ufs_model          00000000009E8E60  Unknown               Unknown  Unknown
 96: ufs_model          0000000000DC2DB1  Unknown               Unknown  Unknown
 96: ufs_model          000000000098B020  Unknown               Unknown  Unknown
 96: ufs_model          00000000009B3948  Unknown               Unknown  Unknown
 96: ufs_model          00000000009BD13B  Unknown               Unknown  Unknown
 96: ufs_model          0000000000AD58C4  Unknown               Unknown  Unknown
 96: ufs_model          0000000000AD983F  Unknown               Unknown  Unknown
 96: ufs_model          0000000001149C7A  Unknown               Unknown  Unknown
 96: ufs_model          00000000012DEA88  Unknown               Unknown  Unknown
 96: ufs_model          0000000000AD6D1A  Unknown               Unknown  Unknown
 96: ufs_model          00000000009E8E60  Unknown               Unknown  Unknown
 96: ufs_model          0000000000DC2DB1  Unknown               Unknown  Unknown
 96: ufs_model          000000000042E373  ufs                       389  UFS.F90
 96: ufs_model          000000000042C83D  Unknown               Unknown  Unknown
 96: libc.so.6          00001547B0499EB0  Unknown               Unknown  Unknown
 96: libc.so.6          00001547B0499F60  __libc_start_main     Unknown  Unknown
 96: ufs_model          000000000042C755  Unknown               Unknown  Unknown

Line 433:

 428       do k=2,km+1
 429          do i=is1, ie1
 430             pem(i,k) = pem(i,k-1) + dm(i,k-1)
 431 #ifdef USE_COND
 432 ! Excluding contribution from condensates:
 433             peg(i,k) = peg(i,k-1) + dm(i,k-1)*(1.-q_con(i,j,k-1))
 434 #endif
 435          enddo
 436       enddo

I found that q_con array has some uninitialized (NaN) elements. If I explicitly initialize q_con array in fv_arrays.F90:

diff --git a/model/fv_arrays.F90 b/model/fv_arrays.F90
index e6e65a6..d18ca37 100644
--- a/model/fv_arrays.F90
+++ b/model/fv_arrays.F90
@@ -1554,6 +1554,7 @@ contains
 #else
       allocate ( Atm%q_con(isd:isd,jsd:jsd,1) )
 #endif
+      Atm%q_con = 0.0

 ! Notes by SJL
 ! Place the memory in the optimal shared mem space

regression test finishes successfully. Should I add this q_con initialization to zero?

lharris4 commented 3 months ago

Hi, @DusanJovic-NOAA Which specific test is failing? This may indicate that q_con is not correctly filled with the proper data in some situations.

You are right that it would be best to explicitly initialize q_con although it should be done in the loops below to ensure the shared memory for OpenMP is set up right. It is also best to set it to real_big; if q_con is set to 0, a physically-valid value, then errors would not be detected.

DusanJovic-NOAA commented 3 months ago

Hi, @DusanJovic-NOAA Which specific test is failing? This may indicate that q_con is not correctly filled with the proper data in some situations.

You are right that it would be best to explicitly initialize q_con although it should be done in the loops below to ensure the shared memory for OpenMP is set up right. It is also best to set it to real_big; if q_con is set to 0, a physically-valid value, then errors would not be detected.

Test that fails is 'control_debug_p8'.

DusanJovic-NOAA commented 3 months ago

When I initialize q_con to real_big, like:

diff --git a/model/fv_arrays.F90 b/model/fv_arrays.F90
index e6e65a6..b2c1fc9 100644
--- a/model/fv_arrays.F90
+++ b/model/fv_arrays.F90
@@ -1566,6 +1566,9 @@ contains
                 Atm%va(i,j,k) = real_big
                 Atm%pt(i,j,k) = real_big
               Atm%delp(i,j,k) = real_big
+#ifdef USE_COND
+             Atm%q_con(i,j,k) = real_big
+#endif
            enddo
         enddo
         do j=jsd, jed+1
@@ -1616,6 +1619,9 @@ contains

            Atm%ts(i,j) = 300.
            Atm%phis(i,j) = real_big
+#ifndef USE_COND
+           Atm%q_con(i,j,1) = real_big
+#endif
         enddo
      enddo

model still fails with invalid floating point exception. If I add debug prints, like:

diff --git a/model/nh_utils.F90 b/model/nh_utils.F90
index 2b89aea..2a56f7e 100644
--- a/model/nh_utils.F90
+++ b/model/nh_utils.F90
@@ -430,6 +430,9 @@ CONTAINS
             pem(i,k) = pem(i,k-1) + dm(i,k-1)
 #ifdef USE_COND
 ! Excluding contribution from condensates:
+            if(i==is1 .and. j==(js-1) .and. k==2) then
+               write(0,*)i,j,k-1, q_con(i,j,k-1)
+            endif
             peg(i,k) = peg(i,k-1) + dm(i,k-1)*(1.-q_con(i,j,k-1))
 #endif
          enddo
@@ -439,6 +442,9 @@ CONTAINS
          do i=is1, ie1
             dz2(i,k) = gz(i,j,k+1) - gz(i,j,k)
 #ifdef USE_COND
+            if(i==is1 .and. j==(js-1) .and. k==1) then
+               write(0,*)i,j,k, peg(i,k),peg(i,k+1)
+            endif
             pm2(i,k) = (peg(i,k+1)-peg(i,k))/log(peg(i,k+1)/peg(i,k))

 #ifdef MOIST_CAPPA

I see that the q_con(is-1,js-1,:) is not properly initialized (has real_big value) and then model actually fails in the next loop where negative value of peg is used in the log function.

backtrace:

 96:            0           0           1  1.0000000E+08
 96:            0           0           1  0.9990000     -6.0600000E+07
 96: [hercules-01-40:434242:0:434242] Caught signal 8 (Floating point exception: floating-point invalid operation)
 96: ==== backtrace (tid: 434242) ====
 96:  0 0x0000000000054d90 __GI___sigaction()  :0
 96:  1 0x00000000002a08d6 __libm_logf_e7()  logf_ct.c:0
 96:  2 0x0000000002f131d8 nh_utils_mod_mp_riem_solver_c_.DIR.OMP.PARALLEL.LOOP.24066.split4069.split4093()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/atmos_cube
d_sphere/model/nh_utils.F90:448
 96:  3 0x000000000015ff13 __kmp_invoke_microtask()  ???:0
 96:  4 0x00000000000cf664 _INTERNALdc23a976::__kmp_serial_fork_call()  /nfs/site/proj/openmp/promo/20230118/tmp/lin_32e-rtl_int_5_nor_dyn.rel.c0.s0.t1..h1.w1-fxilab153/../../src/kmp_runtim
e.cpp:1927
 96:  5 0x00000000000cf664 __kmp_fork_call()  /nfs/site/proj/openmp/promo/20230118/tmp/lin_32e-rtl_int_5_nor_dyn.rel.c0.s0.t1..h1.w1-fxilab153/../../src/kmp_runtime.cpp:2188
 96:  6 0x0000000000088d23 __kmpc_fork_call()  /nfs/site/proj/openmp/promo/20230118/tmp/lin_32e-rtl_int_5_nor_dyn.rel.c0.s0.t1..h1.w1-fxilab153/../../src/kmp_csupport.cpp:350
 96:  7 0x0000000002e499e4 nh_utils_mod_mp_riem_solver_c_()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/atmos_cubed_sphere/model/nh_utils.F90:404
 96:  8 0x00000000024645d3 dyn_core_mod_mp_dyn_core_()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/atmos_cubed_sphere/model/dyn_core.F90:628
 96:  9 0x000000000256c864 fv_dynamics_mod_mp_fv_dynamics_()  /work/noaa/fv3-cam/djovic/ufs/support_intelllvm/ufs-weather-model/FV3/atmos_cubed_sphere/model/fv_dynamics.F90:683
 . . .

Isn't better to initialize all arrays to NaN? That way model will fail as soon as NaN value is used, instead of initializing them to real_big, and then have model fail later in the code where real_big is used.

lharris4 commented 3 months ago

@DusanJovic-NOAA I think you are right; it is better to initialize to NaN so they can be caught immediately. Thank you for your suggestion.

DusanJovic-NOAA commented 3 months ago

Ok. Initializing q_con to NaN:

diff --git a/model/fv_arrays.F90 b/model/fv_arrays.F90
index e6e65a6..c239083 100644
--- a/model/fv_arrays.F90
+++ b/model/fv_arrays.F90
@@ -33,6 +33,12 @@ module fv_arrays_mod

   public

+#ifdef OVERLOAD_R4
+      real, parameter:: real_snan=real(Z'FFBFFFFF')
+#else
+      real, parameter:: real_snan=real(Z'FFF7FFFFFFFFFFFF')
+#endif
+
   integer, public, parameter :: R_GRID = r8_kind

   !Several 'auxiliary' structures are introduced here. These are for
@@ -1566,6 +1572,10 @@ contains
                 Atm%va(i,j,k) = real_big
                 Atm%pt(i,j,k) = real_big
               Atm%delp(i,j,k) = real_big
+#ifdef USE_COND
+             ! Atm%q_con(i,j,k) = real_big
+             Atm%q_con(i,j,k) = real_snan
+#endif
            enddo
         enddo
         do j=jsd, jed+1
@@ -1616,6 +1626,10 @@ contains

            Atm%ts(i,j) = 300.
            Atm%phis(i,j) = real_big
+#ifndef USE_COND
+           ! Atm%q_con(i,j,1) = real_big
+           Atm%q_con(i,j,1) = real_snan
+#endif
         enddo
      enddo

and then printing all values of q_con(is1:ie1,Js1:je1,1) in nh_utiils before it's used:

diff --git a/model/nh_utils.F90 b/model/nh_utils.F90
index 2b89aea..2502e94 100644
--- a/model/nh_utils.F90
+++ b/model/nh_utils.F90
@@ -400,6 +400,12 @@ CONTAINS
       enddo
    endif

+   do j=js-1, je+1
+   do i=is-1, ie+1
+      write(0,*)i,j,q_con(i,j,1)
+   end do
+   end do
+   call flush(0)

 !$OMP parallel do default(none) shared(js,je,is1,ie1,km,delp,pef,ptop,gz,rgrav,w3,pt, &
 #ifdef MULTI_GASES
@@ -430,6 +436,9 @@ CONTAINS
             pem(i,k) = pem(i,k-1) + dm(i,k-1)
 #ifdef USE_COND
 ! Excluding contribution from condensates:
+            ! if(i==is1 .and. j==(js-1) .and. k==2) then
+            !    write(0,*)i,j,k-1, q_con(i,j,k-1)
+            ! endif
             peg(i,k) = peg(i,k-1) + dm(i,k-1)*(1.-q_con(i,j,k-1))
 #endif
          enddo
@@ -439,6 +448,9 @@ CONTAINS
          do i=is1, ie1
             dz2(i,k) = gz(i,j,k+1) - gz(i,j,k)
 #ifdef USE_COND
+            ! if(i==is1 .and. j==(js-1) .and. k==1) then
+            !    write(0,*)i,j,k, peg(i,k),peg(i,k+1)
+            ! endif
             pm2(i,k) = (peg(i,k+1)-peg(i,k))/log(peg(i,k+1)/peg(i,k))

 #ifdef MOIST_CAPPA

I see:

$ grep 'NaN'  err | sort -n
  0:            0           0            NaN
  2:           97           0            NaN
 21:            0          97            NaN
 23:           97          97            NaN
 24:            0           0            NaN
 26:           97           0            NaN
 45:            0          97            NaN
 47:           97          97            NaN
 48:            0           0            NaN
 50:           97           0            NaN
 69:            0          97            NaN
 71:           97          97            NaN
 72:            0           0            NaN
 74:           97           0            NaN
 93:            0          97            NaN
 95:           97          97            NaN
 96:            0           0            NaN
 98:           97           0            NaN
117:            0          97            NaN
119:           97          97            NaN
120:            0           0            NaN
122:           97           0            NaN
141:            0          97            NaN
143:           97          97            NaN

There are exactly 24 NaN values, 4 per tile, and 4 values are at the corners of each tile. For example:

  0:            0           0            NaN
  2:           97           0            NaN
 21:            0          97            NaN
 23:           97          97            NaN

are lower-left, lower-right, upper-left and upper-right tile corners. The layout is (3,8) 24 PES per tile, and ranks 0, 2, 21 and 23 are corner PES of tile 1. Should't these corner (halo) points be filled from neighboring tiles?

lharris4 commented 3 months ago

Now I think I understand what is happening. On the cubed sphere the values in the outermost corners are meaningless, since that is where three faces come to a point. So the column-wise calculations on these corner cells have no effect, but since they are filled with NaN an FPE is raised. To satisfy the compiler, we could initialize the corner values to 0 while the others are set to NaN; or alternately your original idea to just initialize q_con everywhere to 0. would also work. Good catch!

Thanks, Lucas

DusanJovic-NOAA commented 3 months ago

Thank you Lucas. Please see 1a8820f

DusanJovic-NOAA commented 2 months ago

I am testing this code on my laptop with Intel 2024.2.0 and I'm getting these compile errors: in fv_mapz.F90:

[ 81%] Building Fortran object FV3/atmos_cubed_sphere/CMakeFiles/fv3.dir/model/fv_mapz.F90.o
/home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_mapz.F90(683): error #9184: A list item in an OpenMP* data sharing attribute clause must not be an associate name.   [KMP]
!$OMP parallel default(none) shared(is,ie,js,je,km,kmp,ptop,u,v,pe,ua,va,isd,ied,jsd,jed,kord_mt, &
---------------------------------------------------^
/home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_mapz.F90(690): error #9184: A list item in an OpenMP* data sharing attribute clause must not be an associate name.   [FAST_MP_CONSV]
!$OMP                               fast_mp_consv,kord_tm, pe4,npx,npy, ccn_cm3,               &
------------------------------------^
compilation aborted for /home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_mapz.F90 (code 1)
make[2]: *** [FV3/atmos_cubed_sphere/CMakeFiles/fv3.dir/build.make:205: FV3/atmos_cubed_sphere/CMakeFiles/fv3.dir/model/fv_mapz.F90.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:492: FV3/atmos_cubed_sphere/CMakeFiles/fv3.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

and in fv_dynamics.F90:

/home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_dynamics.F90(434): error #9184: A list item in an OpenMP* data sharing attribute clause must not be an associate name. 
  [DP1]
!$OMP parallel do default(none) shared(is,ie,js,je,isd,ied,jsd,jed,npz,dp1,zvir,nwat,q,q_con,sphum,liq_wat, &
-----------------------------------------------------------------------^
/home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_dynamics.F90(452): error #9184: A list item in an OpenMP* data sharing attribute clause must not be an associate name. 
  [DP1]
!$OMP parallel do default(none) shared(is,ie,js,je,isd,ied,jsd,jed,npz,dp1,zvir,q,q_con,sphum,liq_wat, &
-----------------------------------------------------------------------^
/home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_dynamics.F90(461): error #9184: A list item in an OpenMP* data sharing attribute clause must not be an associate name. 
  [CAPPA]
!$OMP                                  cappa,kappa,rdg,delp,pt,delz,nwat)              &
---------------------------------------^
/home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_dynamics.F90(593): error #9184: A list item in an OpenMP* data sharing attribute clause must not be an associate name. 
  [DP1]
!$OMP parallel do default(none) shared(is,ie,js,je,npz,pt,dp1,pkz,q_con)
----------------------------------------------------------^
/home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_dynamics.F90(621): error #9184: A list item in an OpenMP* data sharing attribute clause must not be an associate name. 
  [DTDT_M]
!$OMP parallel do default(none) shared(is,ie,js,je,npz,dtdt_m)
-------------------------------------------------------^
/home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_dynamics.F90(655): error #9184: A list item in an OpenMP* data sharing attribute clause must not be an associate name. 
  [DP1]
!$OMP parallel do default(none) shared(isd,ied,jsd,jed,npz,dp1,delp)
-----------------------------------------------------------^
/home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_dynamics.F90(846): error #9184: A list item in an OpenMP* data sharing attribute clause must not be an associate name. 
  [DTDT_M]
!$OMP parallel do default(none) shared(is,ie,js,je,npz,dtdt_m,bdt)
-------------------------------------------------------^
/home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_dynamics.F90(980): error #9184: A list item in an OpenMP* data sharing attribute clause must not be an associate name. 
  [TE_2D]
!$OMP parallel do default(none) shared(is,ie,js,je,te_2d,teq,dt2,ps2,ps,idiag)
---------------------------------------------------^
compilation aborted for /home/dusan/sufs/simple-ufs/src/model/FV3/atmos_cubed_sphere/model/fv_dynamics.F90 (code 1)
make[2]: *** [FV3/atmos_cubed_sphere/CMakeFiles/fv3.dir/build.make:166: FV3/atmos_cubed_sphere/CMakeFiles/fv3.dir/model/fv_dynamics.F90.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:492: FV3/atmos_cubed_sphere/CMakeFiles/fv3.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

Previously I was testing this one on few of the RDHPCS systems like Hercules, Hera, Gaea with slightly older compilers and I did not see these errors. I do not know if this is the regression in the latest compiler, or an actual bug fixed.

FYI @climbfuji

climbfuji commented 2 months ago

@DusanJovic-NOAA Can you check if the error is within the "if intel" CPP branch?

For background: Older Intel compilers up to 2021.1.0 had a bug that forced us to add all these #ifdef statements for Intel. GNU was following the Fortran standard for specifying or not specifying pointers in OMP directives, Intel didn't (the Intel compiler developers confirmed that). They fixed it in 2021.2.0, but kept their compilers backward compatible. We left the workaround for Intel in place, because we had (and still have ... wcoss2 ...) systems with compilers older than 2021.2.0.

If you are seeing the errors in the if intel branch of the CPP statements, then that means that Intel removed the backward compatibility and now strictly follows the standard (and what GNU does).

Is that the case?

DusanJovic-NOAA commented 2 months ago

If I simply remove all variables that are 'an associate name' from all OMP directives code compiles without errors (I did not run the code yet!). Effectively all OMP directives for the gnu and non-gnu compilers are identical.

bensonr commented 2 months ago

@DusanJovic-NOAA @climbfuji - I can't believe Intel would change anything in ifort as it's basically in maintenance mode at this point.

DusanJovic-NOAA commented 2 months ago

@DusanJovic-NOAA @climbfuji - I can't believe Intel would change anything in ifort as it's basically in maintenance mode at this point.

@bensonr The errors I mentioned above are with ifx, not ifort.

climbfuji commented 2 months ago

@DusanJovic-NOAA @climbfuji - I can't believe Intel would change anything in ifort as it's basically in maintenance mode at this point.

@bensonr The errors I mentioned above are with ifx, not ifort.

That would make sense to me! Time to add more logic to those ifdefs until WCOSS2 gets an updated software stack ;-)

bensonr commented 2 months ago

@DusanJovic-NOAA @climbfuji - I can't believe Intel would change anything in ifort as it's basically in maintenance mode at this point.

@bensonr The errors I mentioned above are with ifx, not ifort.

I expected those errors with ifx and wanted to be sure that's what you were testing.

DusanJovic-NOAA commented 2 months ago

So, if I understand correctly we need two different OMP directives, one without variables that are an associate names listed as shared for GNU and IntelLLVM >= 2024.2.0, and the other one with those variables listed as shared for all Intel Classic and IntelLLVM < 2024.2.0.

Is my understanding correct?

climbfuji commented 2 months ago

So, if I understand correctly we need two different OMP directives, one without variables that are an associate names listed as shared for GNU and IntelLLVM >= 2024.2.0, and the other one with those variables listed as shared for all Intel Classic and IntelLLVM < 2024.2.0.

Is my understanding correct?

I think what you want is ifx or ifort. I think 2024.2.0 still has ifort, and that compiler will very likely behave like the older compiler versions. Either there is a CPP directive that Intel sets for us that lets us distinguish between the two, or we need to do that in cmake.

DusanJovic-NOAA commented 2 months ago

How about this:

     if ( hydrostatic ) then
-#ifdef __GFORTRAN__
+#if defined(__GFORTRAN__) || __INTEL_LLVM_COMPILER >= 20240200
 !$OMP parallel do default(none) shared(is,ie,js,je,isd,ied,jsd,jed,npz,zvir,nwat,q,q_con,sphum,liq_wat, &
 #else
 !$OMP parallel do default(none) shared(is,ie,js,je,isd,ied,jsd,jed,npz,dp1,zvir,nwat,q,q_con,sphum,liq_wat, &
climbfuji commented 2 months ago

Sorry for playing devil's advocate here. So you are saying the earlier versions of ifx (not ifort) behaved like ifort, or at least accepted the logic we put in for ifort?

DusanJovic-NOAA commented 2 months ago

Sorry for playing devil's advocate here. So you are saying the earlier versions of ifx (not ifort) behaved like ifort, or at least accepted the logic we put in for ifort?

Yes. I started seeing these compile errors only when I tried using ifx 2024.2.0, which I have on my laptop. On Ursa test system (replacement for Hera) they have 2024.1.0 and I also did not see these errors on all other RDHPCS machines I tried hera(2021.5.1), orion(2021.9.0), hercules(2021.9.0), gaea (2023.1.0).

Based on this I guess the behavior changed in ifx between 2024.1.0 and 2024.2.0. It would be nice to confirm this with Intel.

DusanJovic-NOAA commented 2 months ago

Do we need these associate names? Can't we declare local pointers, point them to GFDL_interstitial arrays:

      real, dimension(:,:,:), pointer :: cappa
      real, dimension(:,:,:), pointer :: dp1
      real, dimension(:,:,:), pointer :: dtdt_m
      real, dimension(:,:), pointer :: te_2d

      ! ccpp_associate: associate( cappa     => GFDL_interstitial%cappa,     &
      !                            dp1       => GFDL_interstitial%te0,       &
      !                            dtdt_m    => GFDL_interstitial%dtdt,      &
      !                            last_step => GFDL_interstitial%last_step, &
      !                            te_2d     => GFDL_interstitial%te0_2d     )

      cappa => GFDL_interstitial%cappa
      dp1 => GFDL_interstitial%te0
      dtdt_m => GFDL_interstitial%dtdt
      te_2d => GFDL_interstitial%te0_2d

and just use one set of OMP directives. Do 'associate names' behave differently than pointers?

climbfuji commented 2 months ago

Do we need these associate names? Can't we declare local pointers, point them to GFDL_interstitial arrays:

      real, dimension(:,:,:), pointer :: cappa
      real, dimension(:,:,:), pointer :: dp1
      real, dimension(:,:,:), pointer :: dtdt_m
      real, dimension(:,:), pointer :: te_2d

      ! ccpp_associate: associate( cappa     => GFDL_interstitial%cappa,     &
      !                            dp1       => GFDL_interstitial%te0,       &
      !                            dtdt_m    => GFDL_interstitial%dtdt,      &
      !                            last_step => GFDL_interstitial%last_step, &
      !                            te_2d     => GFDL_interstitial%te0_2d     )

      cappa => GFDL_interstitial%cappa
      dp1 => GFDL_interstitial%te0
      dtdt_m => GFDL_interstitial%dtdt
      te_2d => GFDL_interstitial%te0_2d

and just use one set of OMP directives. Do 'associate names' behave differently than pointers?

I think they are both doing the same. But whether it makes a difference for how ifort and ifx treat them in OMP directives I don't know.

DusanJovic-NOAA commented 2 months ago

In this commit https://github.com/NOAA-GFDL/GFDL_atmos_cubed_sphere/pull/353/commits/b947391ed4d23ffa15f5c854e7d5c4849abefc40 I made changes to fv_mapz and fv_dynamics in an attempt to simplify the OMP directives and make them work with GNU, Intel classic and Intel LLVM compilers

In f_mapz:  1) remove associate statement, remove kmp variable (not used) and replace fast_mp_consv with GFDL_interstitial%fast_mp_consv  2) remove GFORTRAN ifdef block, OMP directives previously used for Intel only now work with GNU

in fv_dynamic: 1) replace associate statement and introduce local pointer arrays that point to corresponding arrays in GFDL_interstitial DDT. 2) remove GFORTRAN ifdef block, OMP directives previously used for Intel only now work with GNU 3) replace last_step with GFDL_interstitial%last_step

I ran few tests that use threading on wcoss2, and they passed, and full test passed on Hera (gnu and intel)

bensonr commented 2 months ago

@DusanJovic-NOAA - can you check the performance. Some of the things I found online seemed to indicate pointers might be slower than an associate. I don't understand why that would be, but...

DusanJovic-NOAA commented 2 months ago

These are some of the regression test on Hera. Lines stating with '-' is current develop branch, lines starting with '+' are this branch. Second number is square brackets it wall clock time:

$ git diff logs/RegressionTests_hera.log | grep cpld_2threads_p8_intel
-PASS -- TEST 'cpld_2threads_p8_intel' [06:18, 05:34](3629 MB)
+PASS -- TEST 'cpld_2threads_p8_intel' [08:11, 05:34](3636 MB)

$ git diff logs/RegressionTests_hera.log | grep cpld_control_p8_intel
-PASS -- TEST 'cpld_control_p8_intel' [06:58, 06:07](3338 MB)
+PASS -- TEST 'cpld_control_p8_intel' [09:12, 06:09](3359 MB)

$ git diff logs/RegressionTests_hera.log | grep hrrr_control_2threads_gnu
-PASS -- TEST 'hrrr_control_2threads_gnu' [05:24, 05:01](916 MB)
+PASS -- TEST 'hrrr_control_2threads_gnu' [06:39, 05:04](908 MB)

$ git diff logs/RegressionTests_hera.log | grep rap_2threads_dyn32_phy32_gnu
-PASS -- TEST 'rap_2threads_dyn32_phy32_gnu' [09:05, 08:36](756 MB)
+PASS -- TEST 'rap_2threads_dyn32_phy32_gnu' [10:34, 08:35](750 MB)
bensonr commented 2 months ago

Looks good and thanks for confirming.

jkbk2004 commented 1 month ago

All tests are done at https://github.com/ufs-community/ufs-weather-model/pull/2224. @laurenchilutti @bensonr @lharris4 @XiaqiongZhou-NOAA this pr can be merged.