NOAA-EMC / WCOSS2-requests

Repository to handle WCOSS2 installation requests for software
Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

Install ESMF 8.6.1 and MAPL 2.46.2 -> 2.46.3 #5

Open junwang-noaa opened 2 months ago

junwang-noaa commented 2 months ago

Install ESMF 8.6.1 and MAPL 2.46.2 after netcdf build with zstd is available on wcoss2.

The MAPL 2.46.2 has issues when running with UFS-weather-model. MAPL 2.46.3 has the fix, please install 2.46.3 with ESMF 8.6.1.

8/30/2024: To clarify, MAPL 2.46.3 needs to be installed with ESMF 8.6.1 in both spack-stack 1.6.0 and HPC-stack on Acorn for UFS weather model testing.

junwang-noaa commented 2 months ago

Corresponing UFS weather model issues are:

https://github.com/ufs-community/ufs-weather-model/issues/2345

https://github.com/ufs-community/ufs-weather-model/issues/2346

edwardhartnett commented 2 months ago

There is a build problem that needs to be resolved by the teams:

MAPL 2.46.2/ESMF 8.6.1 (Hang)

This happens on all machines, not just WCOSS2.

edwardhartnett commented 2 months ago

Is there a new release of MAPL now? @Hang-Lei-NOAA can you try installing it?

Hang-Lei-NOAA commented 2 months ago

I installed them and tested them with UFS last Thurday, the problem is still, if not manually link the ESMF to UFS.

I have sent emails to Alex, to ask him to add this to spack-stack 1.6.0 last Friday, which he operated. I did add extra temporary installations into his spack-stack installations, but cannot modify some existing files. Although Brian tested other temporary installations fine, my test on new esmf and mapl/2.46.3 are still the old problem. My additions are totally removable (chmod777).

On Mon, Aug 26, 2024 at 7:59 AM Edward Hartnett @.***> wrote:

Is there a new release of MAPL now? @Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA can you try installing it?

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/WCOSS2-requests/issues/5#issuecomment-2310033137, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFCTRPKFXFTVHN2FUGLZTMKCBAVCNFSM6AAAAABMMP6P3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQGAZTGMJTG4 . You are receiving this because you were mentioned.Message ID: @.***>

edwardhartnett commented 2 months ago

@DusanJovic-NOAA will test installation provided by @AlexanderRichert-NOAA on acorn.

DusanJovic-NOAA commented 2 months ago

@DusanJovic-NOAA will test installation provided by @AlexanderRichert-NOAA on acorn.

I do not have any information about Alex's installation on Acorn. I looked at linked ufs-weather-model issues. Where is it?

Hang-Lei-NOAA commented 2 months ago

@Dusan Jovic - NOAA Affiliate @.***> That is the email I forwarded to you during my vacation. Chained env with ESMF/MAPL updates: /lfs/h1/emc/nceplibs/noscrub/ spack-stack/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46. 2/install/modulefiles/Core

On Fri, Aug 30, 2024 at 10:50 AM Dusan Jovic @.***> wrote:

@DusanJovic-NOAA https://github.com/DusanJovic-NOAA will test installation provided by @AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA on acorn.

I do not have any information about Alex's installation on Acorn. I looked at linked ufs-weather-model issues. Where is it?

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/WCOSS2-requests/issues/5#issuecomment-2321504583, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFCDA2IAPYBQB4K5QQLZUCBEXAVCNFSM6AAAAABMMP6P3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRRGUYDINJYGM . You are receiving this because you were mentioned.Message ID: @.***>

junwang-noaa commented 2 months ago

@edwardhartnett @AlexanderRichert-NOAA Can you provide details on the installation either in this issue or in ufs-weather-model issue #2345? When is it installed and how to load the module? Without this information, we can't test ufs-weather-model.

junwang-noaa commented 2 months ago

@Hang-Lei-NOAA I think Ed said we need to test the library Alex installed. Also MAPL version is 2.46.3

Hang-Lei-NOAA commented 2 months ago

@Jun Wang - NOAA Federal @.***> That is it is. As Alex mentioned in the email, these libraries have been added to spack-stack-1.6.0 as Dusan originally requested on acorn. Spack-stack: /lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.6.0/ envs/ue-esmf-8.6.1-mapl-2.46.2/install/modulefiles/Core

I will also test the my new installations using Dusan's branch under /lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.9

On Fri, Aug 30, 2024 at 10:58 AM Jun Wang @.***> wrote:

@Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA I think Ed said we need to test the library Alex installed. Also MAPL version is 2.46.3

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/WCOSS2-requests/issues/5#issuecomment-2321527616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFDVDJCN2LYBZOLRVX3ZUCCB7AVCNFSM6AAAAABMMP6P3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRRGUZDONRRGY . You are receiving this because you were mentioned.Message ID: @.***>

junwang-noaa commented 2 months ago

@edwardhartnett As discussed in ufs-wether-model issue #2345, the MAPL 2.46.2 does not work. But the library /lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install/modulefiles/Core is still using MAPL 2.46.2. Are you going to ask Alex install a new spack-stack version?

@Hang-Lei-NOAA I assume your installation is for wcoss2 testing since it is using hpc-stack, is it correct? Also is your testing working? Can you list the module file location and the test log? Thanks

Hang-Lei-NOAA commented 2 months ago

@Jun Wang - NOAA Federal @.***> I will further inform Alex to check.

My test with Dusan's branch still have the issue with gocart: CMake Error at CMakeLists.txt:156 (find_package): No "FindESMF.cmake" found in CMAKE_MODULE_PATH.

/lfs/h1/emc/nceplibs/noscrub/Hang.Lei/works/dusanufs/modulefiles/ufs_acorn.intel.lua

On Fri, Aug 30, 2024 at 11:27 AM Jun Wang @.***> wrote:

@edwardhartnett https://github.com/edwardhartnett As discussed in ufs-wether-model issue #2345, the MAPL 2.46.2 does not work. But the library /lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install/modulefiles/Core is still using MAPL 2.46.2. Are you going to ask Alex install a new spack-stack version?

@Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA I assume your installation is for wcoss2 testing since it is using hpc-stack, is it correct? Also is your testing working? Can you list the module file location and the test log? Thanks

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/WCOSS2-requests/issues/5#issuecomment-2321612434, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFGBHYJQSILNM2SCRV3ZUCFN7AVCNFSM6AAAAABMMP6P3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRRGYYTENBTGQ . You are receiving this because you were mentioned.Message ID: @.***>

Hang-Lei-NOAA commented 2 months ago

@Dusan Jovic - NOAA Affiliate @.**> Alex update the spack-stack/1.6.0 on acorn: /lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46. 3*/install/modulefiles/Core

On Fri, Aug 30, 2024 at 12:44 PM Hang Lei - NOAA Affiliate < @.***> wrote:

@Jun Wang - NOAA Federal @.***> I will further inform Alex to check.

My test with Dusan's branch still have the issue with gocart: CMake Error at CMakeLists.txt:156 (find_package): No "FindESMF.cmake" found in CMAKE_MODULE_PATH.

/lfs/h1/emc/nceplibs/noscrub/Hang.Lei/works/dusanufs/modulefiles/ufs_acorn.intel.lua

On Fri, Aug 30, 2024 at 11:27 AM Jun Wang @.***> wrote:

@edwardhartnett https://github.com/edwardhartnett As discussed in ufs-wether-model issue #2345, the MAPL 2.46.2 does not work. But the library /lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install/modulefiles/Core is still using MAPL 2.46.2. Are you going to ask Alex install a new spack-stack version?

@Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA I assume your installation is for wcoss2 testing since it is using hpc-stack, is it correct? Also is your testing working? Can you list the module file location and the test log? Thanks

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/WCOSS2-requests/issues/5#issuecomment-2321612434, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFGBHYJQSILNM2SCRV3ZUCFN7AVCNFSM6AAAAABMMP6P3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRRGYYTENBTGQ . You are receiving this because you were mentioned.Message ID: @.***>

DusanJovic-NOAA commented 2 months ago

Compilation fails with this error:

Force 32-bit build for GOCART
CMake Error at GOCART/CMakeLists.txt:63 (find_package):
  By not providing "FindGFTL_SHARED.cmake" in CMAKE_MODULE_PATH this project
  has asked CMake to find a package configuration file provided by
  "GFTL_SHARED", but CMake did not find one.

  Could not find a package configuration file provided by "GFTL_SHARED" with
  any of the following names:

    GFTL_SHAREDConfig.cmake
    gftl_shared-config.cmake

  Add the installation prefix of "GFTL_SHARED" to CMAKE_PREFIX_PATH or set
  "GFTL_SHARED_DIR" to a directory containing one of the above files.  If
  "GFTL_SHARED" provides a separate development package or SDK, be sure it
  has been installed.

-- Configuring incomplete, errors occurred!

In current spack-stack, gftl-shared module is:

$ ll /lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.6.0/envs/unified-env/install/modulefiles/intel/2022.0.2.262/gftl-shared/
total 4
-rw-r--r-- 1 alexander.richert nceplibs 1182 Jan  6  2024 1.6.1.lua

in ue-esmf-8.6.1-mapl-2.46.3 stack it is:

$ ll /lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.3/install/modulefiles/intel/2022.0.2.262/gftl-shared/
total 4
-rw-r--r-- 1 alexander.richert nceplibs 1219 Aug 30 19:40 main.lua
AlexanderRichert-NOAA commented 2 months ago

According to the MAPL Spack recipe, versions 2.45.x and up require gftl-shared v1.8.0 and up. I can use v1.8.0 or v1.9.0, or I can chance it with 1.6.1 but no promises it wouldn't break anything.

DusanJovic-NOAA commented 2 months ago

Whatever, we just need to have exactly the same module version and the same name of the modules on all RDHPCS platforms and Acorn, because we use ufs_common.lua on all of them.

DusanJovic-NOAA commented 2 months ago

I also see that the current name of mapl module is mapl/2.46.2-esmf-8.6.1 while the new one is just mapl/2.46.3. If we are changing the naming on Acorn, the new name must be used on all other machines.

AlexanderRichert-NOAA commented 2 months ago

Okay, I installed with gftl-shared@1.9.0, and I updated the module file to follow the mapl/xxx-emsf-xxx pattern.

DusanJovic-NOAA commented 2 months ago

Thanks.

I ran cpld_control_p8 test and it failed. I see these messages in the stderr file:

pe=00000 FAIL at line=01088    MAPL_CapGridComp.F90                     <status=41>
pe=00000 FAIL at line=01088    MAPL_CapGridComp.F90                     <status=41>
pe=00000 FAIL at line=01560    MAPL_EsmfRegridder.F90                   <destination masking with this regrid type is unsupported>
pe=00000 FAIL at line=01382    MAPL_EsmfRegridder.F90                   <status=1>
pe=00000 FAIL at line=00977    MAPL_AbstractRegridder.F90               <status=1>
pe=00000 FAIL at line=00097    NewRegridderManager.F90                  <status=1>
pe=00000 FAIL at line=01101    GriddedIO.F90                            <status=1>
pe=00000 FAIL at line=04539    ExtDataGridCompMod.F90                   <status=1>
pe=00000 FAIL at line=01468    ExtDataGridCompMod.F90                   <status=1>
pe=00000 FAIL at line=01838    MAPL_Generic.F90                         <status=1>
pe=00000 FAIL at line=01241    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=01204    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=01164    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=00832    MAPL_CapGridComp.F90                     <status=1>
pe=00000 FAIL at line=00972    MAPL_CapGridComp.F90                     <status=1>
DusanJovic-NOAA commented 2 months ago

With updated GOCART (head of current develop branch), ufs-weather-model is still failing, this time with the error in SU2G_GridCompMod.F90:

pe=00136 FAIL at line=00193    SU2G_GridCompMod.F90                     <status=41>
pe=00136 FAIL at line=04713    MAPL_Generic.F90                         <status=41>
pe=00136 FAIL at line=04900    MAPL_Generic.F90                         <status=41>
pe=00136 FAIL at line=01338    GOCART2G_GridCompMod.F90                 <status=41>
pe=00136 FAIL at line=01316    GOCART2G_GridCompMod.F90                 <status=41>
pe=00136 FAIL at line=00188    GOCART2G_GridCompMod.F90                 <status=41>

This is probably due to how GOCART is configured in our regression tests.

Hang-Lei-NOAA commented 2 months ago

I had the same error with non spack-stack installations.

On Thu, Sep 5, 2024 at 10:44 AM Dusan Jovic @.***> wrote:

With updated GOCART (head of current develop branch), ufs-weather-model is still failing, this time with the error in SU2G_GridCompMod.F90:

pe=00136 FAIL at line=00193 SU2G_GridCompMod.F90 pe=00136 FAIL at line=04713 MAPL_Generic.F90 pe=00136 FAIL at line=04900 MAPL_Generic.F90 pe=00136 FAIL at line=01338 GOCART2G_GridCompMod.F90 pe=00136 FAIL at line=01316 GOCART2G_GridCompMod.F90 pe=00136 FAIL at line=00188 GOCART2G_GridCompMod.F90

This is probably due to how GOCART is configured in our regression tests.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/WCOSS2-requests/issues/5#issuecomment-2331885322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFDJKAAC4MXGVDJAYXTZVBU3DAVCNFSM6AAAAABMMP6P3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZRHA4DKMZSGI . You are receiving this because you were mentioned.Message ID: @.***>

edwardhartnett commented 1 month ago

@DusanJovic-NOAA and @Hang-Lei-NOAA is there an install of these versions that is working anywhere? That is, is there a successful case of these software packages working together?

edwardhartnett commented 1 month ago

OK, as a data point, I installed spack-stack-1.8.0 and it correctly installs the correct versions of netCDF (4.9.2), MAPL (2.46.3), and ESMF (8.6.1). Netcdf-c is installed with zstd and only one copy of the netCDF library is installed, and all other applications are using that one. So all that is good.

edwardhartnett commented 1 month ago

Email from Ed:

All,

There is a current issue on WCOSS2-requests: Install ESMF 8.6.1 and MAPL 2.46.2 -> 2.46.3.

Hang has installed the requested versions of ESMF and MAPL, all built and ESMF passed unit testing (MAPL has no tests). All are using netcdf-c-4.9.2.

When the UFS regression tests are run, there are failures with GOCART cases. See the issue for the exact description. This does not seem to be an installation issue, but a software issue. ESMF-8.6.1 and MAPL-2.46.3 are installed correctly. We have tested both with hpc-stack and spack-stack installs, with the same results. On orion, Brian has apparently encountered the same problems with this combination of software versions.

Hang has experimented and has found when the older MAPL version is used, the regression tests pass.

I'm not sure there is anything further our group can do on this issue. We have installed the software as requested, but cannot fix it, unfortunately. We understand that Brian and Dusan are following up with the MAPL team.

Please let us know if there is anything else we can do to help move this forward.

Thanks, Ed & Hang

Reply from Jun:

We need a bug fix from MAPL 2.46.3. At Monday's model infrastructure meeting, Barry agreed to take a look at the GOCART failure. Dusan transferred the test case to Hera, I just tagged Barry.

DusanJovic-NOAA commented 1 month ago

@AlexanderRichert-NOAA ue-esmf-8.6.1-mapl-2.46.3 environment on Acorn does not have g2/3.5.1 and g2tmpl/1.13.0. Can you please add them.

AlexanderRichert-NOAA commented 1 month ago

Will do

AlexanderRichert-NOAA commented 1 month ago

@DusanJovic-NOAA, please try /lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.6.0/envs/upp-esmf-8.6.1-mapl-2.46.3/install/modulefiles/Core

edwardhartnett commented 1 month ago

@DusanJovic-NOAA did you find the versions you need?

RatkoVasic-NOAA commented 1 month ago

For your tests, you can find new installations (esmf-8.6.1-mapl-2.46.3) on Orion and Hercules:

Hercules: /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.3/install/modulefiles/Core
Orion: /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.3/install/modulefiles/Core

Included are new g2, g2tmpl and fms.

edwardhartnett commented 4 weeks ago

One way I think we went astray here is biting off too much at once.

Can we update ESMF to 8.6.1 and get that all resolved before we upgrade MAPL?