ACCESS-NRI / ACCESS-OM2

ACCESS-OM2: ACCESS Ocean-Sea Ice Model
Apache License 2.0
5 stars 0 forks source link

Build ACCESS-OM2 with Spack openmpi and run with system openmpi #13

Closed harshula closed 3 weeks ago

harshula commented 1 year ago

[Updated: 10/08/2023]

This Issue is focused on building ACCESS-OM2 with Spack openmpi (works) and running it with system openmpi (fails)

harshula commented 1 year ago

I have built ACCESS-OM2 with Spack using Spack built openmpi. It has been run with Spack built openmpi resulting in 5x slow down compared to COSIMA ACCESS-OM2 (built, without Spack, and run using system openmpi). That's documented in https://github.com/ACCESS-NRI/ACCESS-OM/issues/6.

harshula commented 1 year ago
  1. Use an external precompiled perl binary on gadi:

    --- a/etc/spack/defaults/packages.yaml
    +++ b/etc/spack/defaults/packages.yaml
    @@ -62,3 +62,8 @@ packages:
     permissions:
       read: world
       write: user
    +  perl:
    +    externals:
    +    - spec: perl@5.26.3~cpanm+shared+threads
    +      prefix: /usr
    +    buildable: false
  2. Insert Spack build instructions here. Spack v0.20.1 built openmpi 4.0.2 appears to trip over https://github.com/spack/spack/issues/30906 when testing Spack v0.20.1 built access-om2. The workaround was to use a newer version of openmpi. e.g. 4.1.5 avoided the problem.

  3. Convert the ACCESS-OM2 binaries to use RUNPATH instead of RPATH. For testing purposes, I simply did: $ chrpath -c <binary>

  4. Insert replication instructions here.

harshula commented 1 year ago

Notes fms_ACCESS-OM.x: symbol lookup error: /apps/openmpi-mofed5.8-pbs2021.1/4.1.5/lib/libmpi.so.40: undefined symbol: opal_common_ucx

$ nm /apps/openmpi-mofed5.8-pbs2021.1/4.1.5/lib/libmpi.so.40 | grep opal_common
                 U opal_common_ucx
                 U opal_common_ucx_del_procs
                 U opal_common_ucx_empty_complete_cb
                 U opal_common_ucx_mca_deregister
                 U opal_common_ucx_mca_proc_added
                 U opal_common_ucx_mca_register
                 U opal_common_ucx_mca_var_register
                 U opal_common_ucx_support_level
00000000000d9e20 t opal_common_ucx_wait_request.part.2
$ nm /apps/openmpi-mofed5.8-pbs2021.1/4.1.5/lib/libopen-pal.so | grep opal_common_ucx
00000000004720a0 D opal_common_ucx
00000000001ad400 T opal_common_ucx_del_procs
00000000001ad1e0 T opal_common_ucx_del_procs_nofence
00000000001a0e10 T opal_common_ucx_empty_complete_cb
00000000001ad970 T opal_common_ucx_mca_deregister
000000000019fb20 t opal_common_ucx_mca_fence_complete_cb
00000000001ad360 T opal_common_ucx_mca_pmix_fence
00000000001a0df0 T opal_common_ucx_mca_pmix_fence_nb
00000000001aabc0 T opal_common_ucx_mca_proc_added
00000000001ad9a0 T opal_common_ucx_mca_register
00000000001ada60 T opal_common_ucx_mca_var_register
00000000001a0220 t opal_common_ucx_mem_release_cb
0000000000472240 d opal_common_ucx_mutex
00000000001ad420 T opal_common_ucx_support_level
00000000001ad070 t opal_common_ucx_wait_all_requests
harshula commented 1 year ago

Notes $ spack install access-om2 ^netcdf-c@4.7.4 ^netcdf-fortran@4.5.2 ^openmpi@4.1.5 fabrics=ucx

ucx@1.14.0 was installed by Spack.

[gadi] /jobfs/78105093.gadi-pbs/0/openmpi/4.1.5/source/openmpi-4.1.5/ompi/mca/pml/ucx/pml_ucx.c:178  Error: Failed to receive UCX worker address: Not found (-13)
[gadi] /jobfs/78105093.gadi-pbs/0/openmpi/4.1.5/source/openmpi-4.1.5/ompi/mca/pml/ucx/pml_ucx.c:178  Error: Failed to receive UCX worker address: Not found (-13)
[gadi] /jobfs/78105093.gadi-pbs/0/openmpi/4.1.5/source/openmpi-4.1.5/ompi/mca/pml/ucx/pml_ucx.c:178  Error: Failed to receive UCX worker address: Not found (-13)
[gadi] /jobfs/78105093.gadi-pbs/0/openmpi/4.1.5/source/openmpi-4.1.5/ompi/mca/pml/ucx/pml_ucx.c:178  Error: Failed to receive UCX worker address: Not found (-13)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
vsoch commented 1 year ago

@harshula I ran into this error too for a container build (outside of spack) https://github.com/spack/spack/issues/30906

However I'm using the head branch of openmpi, which is greater than 4.1.5. Do you have a config / log somewhere that you kept the set of working / compatible versions of things?

harshula commented 1 year ago

Hi @vsoch , I've updated the issue description. I can build ACCESS-OM2 with Spack openmpi 4.1.5, but I can't run it with gadi's system openmpi. Is that what you are experiencing?

vsoch commented 1 year ago

I finally got it working after much pain - I've pinned all versions except for prrrte so I should do that. It's very janky but at least it seems to work? https://github.com/researchapps/pmix-compose/blob/611b0e13e381bba1e61f4d2c73ea67d2f9ba5046/Dockerfile

harshula commented 1 year ago

Hi @vsoch , Have you tried using Spack and Spack's container support ?

vsoch commented 1 year ago

lol No I've never heard of spack, what's that? Just kidding :) Yes and yes, and I'm not interested, thanks!

harshula commented 1 year ago

Hi @vsoch, We would benefit from knowing more about the pros and cons of Spack's container support. Can you please elaborate on your experience with it?

vsoch commented 1 year ago

I regard the spack team very highly, so don't want to bias you on it, but I'm happy to share my experiences. My general sentiment is that if there is a piece of software that builds well with spack, if you can build that into a container with a spack.yaml, that's a reasonable approach (and indeed we have many lammps bases that do this, here is an example). That particular container started as a spack.yaml and was ported to the container, and we've been able to update it once (with a different spack version) with some difficulty. But now that it's built and provide what we needs, we're good. So if/when you get something working, save absolutely everything, lock file and version wise, so you can build it again. Also the first time you try, do it dynamically so if/when it fails you don't need to start from scratch. You will very likely run into issue if you try to update a version of a dependency of spack itself (this has been my experience).

As for using spack containerize, the extent to which something builds depends on the extent to which it would reliably build with spack outside of the container, and that's a mixed bag. I helped maintain a build service called autamus for a bit that exclusively used spack containerize, and it was overwhelming to debug and keep up. This is no fault of spack I think - building is really hard. Dependencies changing break the things that depend on them. So in my experience I've found what is reasonable is maintaining a small family of builds that I care about. For example, on the Flux team I maintain our flux spack packages, and we have a repository https://github.com/flux-framework/spack that does the package build every night, and syncs changes (and helps us open PRs with releases) for Flux. It's made the process of being a maintainer immensely more easy, because I mostly just watch for an occasional failure, and then click a link in an issue to open a PR for a new release.

For software that I want to build into containers, my general preference is to choose the design of the container build that is optimal for the software. For production containers (e.g., Kubernetes operators or small developer tools) that use go or rust single binaries, this usually means multi-stage builds that I can get rid of everything aside from the basic runtime dependencies / binary. For most of my containers that are more development environments or similar, I like choosing a same OS base (rocky or debian or ubuntu these days) and then adding on the minimal system level packages that I need. Of course this isn't optimized for HPC niche architectures, but that doesn't tend to be my use case. I think what is most important for me is reproducibility of the build and container over time, and I find system package managers and "the most basic" installs most reliable. It's really satisfying, for example to update the OS of a container and have most of the packages still build and install. The issue with spack in a container is that you can't easily abstract away the spack opt install directory, and if you create a view (as autamus did) it still can be challenging if, for example, you have two versions of a dependency that procure the same file and it's not allowed to create. I might not have good perspective because I have a lot of experience making containers, but I can whip them up fairly quickly, even from scratch. E.g., I started this repository of automated builds recently.

That's my high level 0.02 - it really depends. Do you have specific questions or use cases that can help to guide my answer or advice?

harshula commented 3 weeks ago

By default, these Spack binaries are built using RPATH to find the MPI libraries.

Docker

# ldd /opt/release/linux-rocky8-x86_64/intel-2021.2.0/mom5-master-xrrpp5buib3cwbqjs3ozleuh55uhng2f/bin/fms_ACCESS-OM.x | grep libmpi.so
    libmpi.so.40 => /opt/release/linux-rocky8-x86_64/intel-2021.2.0/openmpi-4.0.2-x6n5edkq5s4kvrrzwcfooy6ah6r7pjul/lib/libmpi.so.40 (0x00007ff43192d000)

Gadi

$ ldd /[...]/test-v0.22-ldd-openmpi/release/linux-rocky8-x86_64/intel-19.0.5.281/mom5-master-qchmggxzhnptrymkgjd633rp7nuppiuu/bin/fms_ACCESS-OM.x | grep libmpi.so
        libmpi.so.40 => /apps/openmpi/4.0.2/lib/libmpi.so.40 (0x00007f5a4b5b7000)

Even if we convert the binaries to use RUNPATH instead, we'd still have to add to $LD_LIBRARY_PATH Gadi's path to the MPI libraries. Then we have to ask ourselves, what was the point? We are not seeking portability. We simply need to be able to run on Gadi and that is achieved easily by building the binaries via Spack on Gadi.