ESMCI / ccs_config_cesm

CESM CIME Case Control System configuration files
3 stars 43 forks source link

Update module versions on Casper and add settings for the RRTMGP GPU code #88

Closed sjsprecious closed 1 year ago

sjsprecious commented 1 year ago

This PR will:

fischer-ncar commented 1 year ago

Do I need to build updated ESMF libraries on casper?

jedwards4b commented 1 year ago

@fischer-ncar I think there was a bug in ESMF but it only affects high task count jobs, so it should not be a problem on casper.

fischer-ncar commented 1 year ago

I've built the 8.4.1 release version of esmf on casper for nvhpc and intel. So you can change esmf-8.4.1b01-ncdfio to esmf-8.4.1-ncdfio. If you prefer, I could also build an 8.5.0 beta snapshot.

sjsprecious commented 1 year ago

Thanks @fischer-ncar . I just updated the ESMF version on Casper to v8.4.1 for the nvhpc compiler.

sjsprecious commented 1 year ago

Hi @fischer-ncar , I just did a CAM run on Casper with nvhpc compiler and esmf8.4.1, but it failed at runtime.

The error messages are listed below:

0230404 234000.569 ERROR            PET00 ESMF_Mesh.F90:1980 ESMF_MeshCreateFromFile() Operation not yet supported  - Internal subroutine call returned Error
20230404 234000.569 ERROR            PET00 /glade/scratch/sunjian/COURTNEY_RRTMGP/components/cice/src/cicecore/drivers/nuopc/cmeps/ice_comp_nuopc.F90:678 Operation not yet supported  - Passing error in return code
20230404 234000.569 ERROR            PET00 /glade/p/cesmdata/cseg/PROGS/build/85175/esmf-8.4.1_casper/src/Infrastructure/Trace/src/ESMCI_Trace.C:1527 ESMCI:TraceEventPhaseExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [ICE] IPDv01p1 Expected exit from: cice_init_total
20230404 234000.569 ERROR            PET00 /glade/p/cesmdata/cseg/PROGS/build/85175/esmf-8.4.1_casper/src/Infrastructure/Trace/src/ESMCI_Trace.C:1485 ESMCI::TraceEventCompPhaseExit() Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.569 ERROR            PET00 ESMCI_FTable.C:832 ESMCI_FTableCallEntryPointVMHop Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.569 ERROR            PET00 ESMCI_FTable.C:1100 c_esmc_compwait Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.569 ERROR            PET00 ESMF_Comp.F90:1256 ESMF_CompExecute Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.569 ERROR            PET00 ESMF_GridComp.F90:1426 ESMF_GridCompInitialize Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.569 ERROR            PET00 ESM0001:src/addon/NUOPC/src/NUOPC_Driver.F90:2832 Wrong argument specified  - Failed calling phase 'IPDv01p1' Initialize for modelComp 4: ICE
20230404 234000.569 ERROR            PET00 ESM0001:src/addon/NUOPC/src/NUOPC_Driver.F90:1313 Wrong argument specified  - Passing error in return code
20230404 234000.569 ERROR            PET00 /glade/p/cesmdata/cseg/PROGS/build/85175/esmf-8.4.1_casper/src/Infrastructure/Trace/src/ESMCI_Trace.C:1527 ESMCI:TraceEventPhaseExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [ESM0001] IPDv02p1 Expected exit from: cice_init_total
20230404 234000.569 ERROR            PET00 /glade/p/cesmdata/cseg/PROGS/build/85175/esmf-8.4.1_casper/src/Infrastructure/Trace/src/ESMCI_Trace.C:1485 ESMCI::TraceEventCompPhaseExit() Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.569 ERROR            PET00 ESMCI_FTable.C:832 ESMCI_FTableCallEntryPointVMHop Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.569 ERROR            PET00 ESMCI_FTable.C:1100 c_esmc_compwait Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.569 ERROR            PET00 ESMF_Comp.F90:1256 ESMF_CompExecute Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.569 ERROR            PET00 ESMF_GridComp.F90:1426 ESMF_GridCompInitialize Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.569 ERROR            PET00 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:2832 Wrong argument specified  - Failed calling phase 'IPDv02p1' Initialize for modelComp 1: ESM0001
20230404 234000.569 ERROR            PET00 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:1318 Wrong argument specified  - Passing error in return code
20230404 234000.569 ERROR            PET00 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:478 Wrong argument specified  - Passing error in return code
20230404 234000.569 ERROR            PET00 /glade/p/cesmdata/cseg/PROGS/build/85175/esmf-8.4.1_casper/src/Infrastructure/Trace/src/ESMCI_Trace.C:1527 ESMCI:TraceEventPhaseExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [ensemble] Init 1 Expected exit from: cice_init_total
20230404 234000.569 ERROR            PET00 /glade/p/cesmdata/cseg/PROGS/build/85175/esmf-8.4.1_casper/src/Infrastructure/Trace/src/ESMCI_Trace.C:1485 ESMCI::TraceEventCompPhaseExit() Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.571 ERROR            PET00 ESMCI_FTable.C:832 ESMCI_FTableCallEntryPointVMHop Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.571 ERROR            PET00 ESMCI_FTable.C:1100 c_esmc_compwait Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.571 ERROR            PET00 ESMF_Comp.F90:1256 ESMF_CompExecute Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.571 ERROR            PET00 ESMF_GridComp.F90:1426 ESMF_GridCompInitialize Wrong argument specified  - Internal subroutine call returned Error
20230404 234000.571 ERROR            PET00 /glade/scratch/sunjian/COURTNEY_RRTMGP/components/cmeps/cime_config/../cesm/driver/esmApp.F90:130 Wrong argument specified  - Passing error in return code
20230404 234000.571 INFO             PET00 Finalizing ESMF
20230404 234000.571 ERROR            PET00 /glade/p/cesmdata/cseg/PROGS/build/85175/esmf-8.4.1_casper/src/Infrastructure/Trace/src/ESMCI_Trace.C:1636 ESMCI::TraceEventRegionExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [ESMF] Expected exit from: cice_init_total
20230404 234000.571 ERROR            PET00 /glade/p/cesmdata/cseg/PROGS/build/85175/esmf-8.4.1_casper/src/Infrastructure/Trace/src/ESMCI_Trace.C:1077 ESMCI::TraceClose() Wrong argument specified  - Internal subroutine call returned Error

If I switched back to esmf8.4.1b01, my CAM run worked just fine.

sjsprecious commented 1 year ago

Hi @jedwards4b , I just made some new changes to this PR. Could you please review it again and let me know if it could be merged into the main branch later? Thanks.