GEOS-ESM / GEOSldas

Repository for the GEOS Land Data Assimilation Fixture
Apache License 2.0
11 stars 10 forks source link

WIP: Test using Intel MPI 2021.13 #780

Closed mathomp4 closed 1 month ago

mathomp4 commented 1 month ago

This is a test PR for GEOSldas testing the Intel MPI functionality in https://github.com/GEOS-ESM/GEOSldas_GridComp/pull/57

It updates ESMA_env to use Intel MPI at discover on SLES15 (see https://github.com/GEOS-ESM/ESMA_env/compare/v4.29.0...feature/mathomp4/add-impi-support-ldas) but not the compiler. So this should be zero-diff.

biljanaorescanin commented 1 month ago

All regression tests passed BUT all runs took longer to run than before . I will try to run this few more times to see was it just discover fluke or real change. @gmao-rreichle , @mathomp4 maybe we should keep PR as draft until I have at least one more run of this?

gmao-rreichle commented 1 month ago

all runs took longer to run than before

Thanks, @biljanaorescanin. As discussed on Teams, please go ahead and run this at least a couple more times in the coming days to see if the ~25% increase in runtime for the GLOBAL/assim test is a Discover fluke or a systematic difference between IntelMPI and OpenMPI.

If IntelMPI is indeed slower by 25%, we cannot make IntelMPI the default as is. Perhaps things will change again when we go to the Intel 2021.13 compiler? But let's see first if we can get a better handle on the runtimes.

cc: @weiyuan-jiang @mathomp4

mathomp4 commented 1 month ago

Huh. With GEOS, Intel MPI has been as performant or better than Open MPI on the Milans.

@biljanaorescanin or @weiyuan-jiang Can you take a look at the resulting experiments and make sure the Intel MPI flags were properly added to lenkf.j? At least for GEOSgcm:

        setenv I_MPI_FABRICS shm:ofi
        setenv I_MPI_OFI_PROVIDER psm3

were found to be good. (The psm3 one is actually required on the Milans.)

If we find GEOSldas prefers Open MPI, then I'll need to maintain a couple different ESMA_env (one Intel MPI, one Open MPI). Not hard, but making a note of that for myself.

biljanaorescanin commented 1 month ago

@mathomp4 and @gmao-rreichle I've run this over past several days, and it is almost neutral for run times before and after.

Range for global assim test was from 1292-1063 seconds and for runs without this switch it was from 1284-1034 seconds. Day I initially compared run times, run with this branch took the longest and with develop it was one of the shortest.

All regression tests are zero diff.

gmao-rreichle commented 1 month ago

@mathomp4 : We're ready to use the new env (with IntelMPI) for the LDAS. Do you have a release of env? The present PR still uses a branch, and we don't want to merge with the env branch. Or do you want to wait with the new env until you can also up the compiler? Your call, just let us know, thanks

mathomp4 commented 1 month ago

@mathomp4 : We're ready to use the new env (with IntelMPI) for the LDAS. Do you have a release of env? The present PR still uses a branch, and we don't want to merge with the env branch. Or do you want to wait with the new env until you can also up the compiler? Your call, just let us know, thanks

@gmao-rreichle I think for now we wait. "Soon" I hope to move to Intel ifort 2021.13 and Intel MPI 2021.13 for the GCM and I figure that's when we can move the LDAS. There is no crucial need now unless people are having issues with Open MPI. If they do, let me know and then I'll move it.

For now I'll close this until we move.