JCSDA / spack-stack

Creative Commons Zero v1.0 Universal
21 stars 41 forks source link

Overhaul packages.yaml #1138

Closed AlexanderRichert-NOAA closed 9 hours ago

AlexanderRichert-NOAA commented 2 weeks ago

Summary

This PR updates common/packages.yaml to remove unused entries, unpin versions that don't need to be pinned, and apply hard requirements as much as possible.

Status:

99% of the entries now use require:.

bacio:
  version: ['2.4.1']

becomes

bacio:
  require: '@2.4.1'

Then if someone wants to use a different version in their own environment (or site config), they have to override it (note the ::):

bacio::
  require: '@2.5.0'

Testing

Make sure to check for duplicates!

Applications affected

All

Systems affected

All

Dependencies

none

Issue(s) addressed

Fixes #1115

Checklist

climbfuji commented 2 weeks ago

I am wondering if it makes sense to wait until we've relaxed all versions that we don't really care about?

AlexanderRichert-NOAA commented 2 weeks ago

That's fine with me... Would it make sense to go through it in its current state and see what we actually need?

climbfuji commented 2 weeks ago

That's fine with me... Would it make sense to go through it in its current state and see what we actually need?

Yes, absolutely - if you don't mind, please go ahead! Thanks very much.

climbfuji commented 2 weeks ago

@AlexanderRichert-NOAA You may want to hold back until we've repaired the Ubuntu CI runner

climbfuji commented 2 weeks ago

@AlexanderRichert-NOAA ubuntu ci is good again. macos ci may be shaky now, but ignore it please

mathomp4 commented 1 week ago

@srherbener @RatkoVasic-NOAA @mathomp4 FYI

I'm willing to try it out...if I can learn how. 😄 I'm currently at the "I know how to load and build with spack-stack" stage. I guess I need to advance to the "test a PR" stage...

AlexanderRichert-NOAA commented 1 week ago

@climbfuji this seems to be in pretty good shape. Ideally it would be good to run at least a concretization on each platform just make sure it doesn't fail and to check for duplicates.

climbfuji commented 1 week ago

@climbfuji this seems to be in pretty good shape. Ideally it would be good to run at least a concretization on each platform just make sure it doesn't fail and to check for duplicates.

Thanks @AlexanderRichert-NOAA. I'll create a test environment for NEPTUNE on one of the NRL machines to run some tests; I encourage others (@RatkoVasic-NOAA @srherbener @mathomp4) to do the same with their applications.

climbfuji commented 1 week ago

FYI I'm having issues on Nautilus which uses mkl as blas, fftw-api and lapack provider. Something insists on netlib-lapack and doesn't want to use mkl (that's not the case for develop, so must be a new/different version of one of the packages).

climbfuji commented 4 days ago

FYI I'm having issues on Nautilus which uses mkl as blas, fftw-api and lapack provider. Something insists on netlib-lapack and doesn't want to use mkl (that's not the case for develop, so must be a new/different version of one of the packages).

I resolved those issues for the unified environment on Nautilus with Intel (classic). No clean solution that works for all compilers. Now running into issues with the neptune standalone environment :-(

srherbener commented 2 days ago

I tried building on my Mac - I have Sonoma 14.4.1 with apple-clang%15.0.0 and I'm running into the musl build issue again. Concretize successfully completed and I see crtm, esmf and fms show up in the duplicates list:

MacBook-Pro-5:unified-env.mymacos steveherbener$ ../../util/show_duplicate_packages.py -d log.concretize 
z35pjfx  crtm@2.4.0.1%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  +fix~ipo  build_system=cmake  build_type=Release  generator=make  arch=darwin-sonoma-m1
p563nq7  crtm@v2.4.1-jedi%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  +fix~ipo  build_system=cmake  build_type=Release  generator=make  arch=darwin-sonoma-m1
frvnpsl  crtm@v2.4-jedi.2%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  +fix~ipo  build_system=cmake  build_type=Release  generator=make  arch=darwin-sonoma-m1
bzhk57c  esmf@8.6.1%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  ~debug~external-lapack+external-parallelio+mpi+netcdf~pnetcdf+shared~xerces  build_system=makefile  esmf_comm=auto  esmf_os=auto  esmf_pio=auto  patches=f63d405  snapshot=none  arch=darwin-sonoma-m1
omhbmud  esmf@8.7.0b04%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  ~debug~external-lapack+external-parallelio+mpi+netcdf~pnetcdf+shared~xerces  build_system=makefile  esmf_comm=auto  esmf_os=auto  esmf_pio=auto  patches=f63d405  snapshot=b04  arch=darwin-sonoma-m1
gvbdrwc  fms@release-jcsda%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  +gfs_phys+internal_file_nml~ipo~large_file+openmp+quad_precision  build_system=cmake  build_type=Release  generator=make  precision=32  arch=darwin-sonoma-m1
h5vv6ws  fms@2023.04%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  +deprecated_io+gfs_phys+internal_file_nml~ipo~large_file+openmp+pic+quad_precision~yaml  build_system=cmake  build_type=Release  constants=GFS  generator=make  precision=32,64  arch=darwin-sonoma-m1
===
Duplicates found!

Are these the expected duplicates?

Also I can try testing on S4 for JCSDA. @climbfuji what is the method for doing this. Do I become the jedipara user and then build in a directory off to the side, test the installation out, then remove the build? Thanks!

climbfuji commented 2 days ago

I tried building on my Mac - I have Sonoma 14.4.1 with apple-clang%15.0.0 and I'm running into the musl build issue again. Concretize successfully completed and I see crtm, esmf and fms show up in the duplicates list:

MacBook-Pro-5:unified-env.mymacos steveherbener$ ../../util/show_duplicate_packages.py -d log.concretize 
z35pjfx  crtm@2.4.0.1%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  +fix~ipo  build_system=cmake  build_type=Release  generator=make  arch=darwin-sonoma-m1
p563nq7  crtm@v2.4.1-jedi%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  +fix~ipo  build_system=cmake  build_type=Release  generator=make  arch=darwin-sonoma-m1
frvnpsl  crtm@v2.4-jedi.2%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  +fix~ipo  build_system=cmake  build_type=Release  generator=make  arch=darwin-sonoma-m1
bzhk57c  esmf@8.6.1%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  ~debug~external-lapack+external-parallelio+mpi+netcdf~pnetcdf+shared~xerces  build_system=makefile  esmf_comm=auto  esmf_os=auto  esmf_pio=auto  patches=f63d405  snapshot=none  arch=darwin-sonoma-m1
omhbmud  esmf@8.7.0b04%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  ~debug~external-lapack+external-parallelio+mpi+netcdf~pnetcdf+shared~xerces  build_system=makefile  esmf_comm=auto  esmf_os=auto  esmf_pio=auto  patches=f63d405  snapshot=b04  arch=darwin-sonoma-m1
gvbdrwc  fms@release-jcsda%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  +gfs_phys+internal_file_nml~ipo~large_file+openmp+quad_precision  build_system=cmake  build_type=Release  generator=make  precision=32  arch=darwin-sonoma-m1
h5vv6ws  fms@2023.04%apple-clang@15.0.0  ldflags=-Wl,-ld_classic  +deprecated_io+gfs_phys+internal_file_nml~ipo~large_file+openmp+pic+quad_precision~yaml  build_system=cmake  build_type=Release  constants=GFS  generator=make  precision=32,64  arch=darwin-sonoma-m1
===
Duplicates found!

Are these the expected duplicates?

Also I can try testing on S4 for JCSDA. @climbfuji what is the method for doing this. Do I become the jedipara user and then build in a directory off to the side, test the installation out, then remove the build? Thanks!

The musl issue is related to your libiconv not being found/accepted as an external package and as the only provider for iconv if I recall correctly. Not related to this PR. For S4, why jedipara? You can test with your own user as well, or?

climbfuji commented 2 days ago

Also, @srherbener please note that there is a follow-up PR #1159 that you should use for testing; it will eventually be merged into this PR.

climbfuji commented 2 days ago

Also, @srherbener please note that there is a follow-up PR #1159 that you should use for testing; it will eventually be merged into this PR.

Also, @srherbener please note that there is a follow-up PR #1159 that you should use for testing; it will eventually be merged into this PR.

1159 was merged into this PR, thus #1138 is the best version to test (if you started already with #1159, no worries; they are identical).

AlexanderRichert-NOAA commented 2 days ago

@climbfuji using packages:all:prefer:['%oneapi'] took care of the oneapi issues, in that gcc-runtime and bison use %gcc but everything else uses %oneapi.

climbfuji commented 2 days ago

@climbfuji using packages:all:prefer:['%oneapi'] took care of the oneapi issues, in that gcc-runtime and bison use %gcc but everything else uses %oneapi.

If the compiler you want to build with is oneapi, I assume?

AlexanderRichert-NOAA commented 2 days ago

Yes

climbfuji commented 2 days ago

Yes

Thanks. I'll add this to spack stack create when I finally get to work on the single compiler logic

srherbener commented 19 hours ago

@AlexanderRichert-NOAA, @climbfuji I am building spack-stack on S4 now. I had the --source option on the install command (as written in the documentation) and rapidly used up my quota of inodes. I'm trying now without the --source option.

The concretize step worked just fine. Install is running.

jedi-bundle is currently broken so I'm not sure when I can test that and skylab. Perhaps it's best to call it good with the successful concretize, and see if the install works. I don't want to unnecessarily hold up this PR.

What do you think?

climbfuji commented 19 hours ago

@AlexanderRichert-NOAA, @climbfuji I am building spack-stack on S4 now. I had the --source option on the install command (as written in the documentation) and rapidly used up my quota of inodes. I'm trying now without the --source option.

The concretize step worked just fine. Install is running.

jedi-bundle is currently broken so I'm not sure when I can test that and skylab. Perhaps it's best to call it good with the successful concretize, and see if the install works. I don't want to unnecessarily hold up this PR.

What do you think?

We should update the instructions to not use --source by default. I am ok with moving ahead once your install finishes.

srherbener commented 14 hours ago

I am testing on Discover too. concretize gets a failure:

sherbene@discover12:/discover/nobackup/sherbene/projects/spack-stack/envs/unified-dev.discover-scu16> spack concretize --force --fresh 2>&1 | tee log.concretize
==> Error: concretization failed for the following reasons:

   1. cannot satisfy a requirement for package 'flex'.
==> Fetching https://mirror.spack.io/bootstrap/github-actions/v0.5/build_cache/linux-centos7-x86_64-gcc-10.2.1-clingo-bootstrap-spack-bhqgwuvef354fwuxq7heeighavunpber.spec.json
==> Fetching https://mirror.spack.io/bootstrap/github-actions/v0.5/build_cache/linux-centos7-x86_64/gcc-10.2.1/clingo-bootstrap-spack/linux-centos7-x86_64-gcc-10.2.1-clingo-bootstrap-spack-bhqgwuvef354fwuxq7heeighavunpber.spack
==> Installing "clingo-bootstrap@=spack%gcc@=10.2.1~docs+ipo+optimized+python+static_libstdcpp build_system=cmake build_type=Release generator=make patches=bebb819,ec99431 arch=linux-centos7-x86_64" from a buildcache

I think the issue is that common/packages.yaml has:

    flex:
      # Pin version to avoid duplicates
      require: '@2.6.4'

while site/packages.yaml has:

  flex:
    # Must set buildable: false to avoid duplicate packages
    buildable: false
    externals:
    - spec: flex@2.5.37+lex
      prefix: /usr

For now I can make the flex version match in common/packages.yaml in my environment area. But does that mean that we need to change something in this PR?

climbfuji commented 14 hours ago

I am testing on Discover too. concretize gets a failure:

sherbene@discover12:/discover/nobackup/sherbene/projects/spack-stack/envs/unified-dev.discover-scu16> spack concretize --force --fresh 2>&1 | tee log.concretize
==> Error: concretization failed for the following reasons:

   1. cannot satisfy a requirement for package 'flex'.
==> Fetching https://mirror.spack.io/bootstrap/github-actions/v0.5/build_cache/linux-centos7-x86_64-gcc-10.2.1-clingo-bootstrap-spack-bhqgwuvef354fwuxq7heeighavunpber.spec.json
==> Fetching https://mirror.spack.io/bootstrap/github-actions/v0.5/build_cache/linux-centos7-x86_64/gcc-10.2.1/clingo-bootstrap-spack/linux-centos7-x86_64-gcc-10.2.1-clingo-bootstrap-spack-bhqgwuvef354fwuxq7heeighavunpber.spack
==> Installing "clingo-bootstrap@=spack%gcc@=10.2.1~docs+ipo+optimized+python+static_libstdcpp build_system=cmake build_type=Release generator=make patches=bebb819,ec99431 arch=linux-centos7-x86_64" from a buildcache

I think the issue is that common/packages.yaml has:

    flex:
      # Pin version to avoid duplicates
      require: '@2.6.4'

while site/packages.yaml has:

  flex:
    # Must set buildable: false to avoid duplicate packages
    buildable: false
    externals:
    - spec: flex@2.5.37+lex
      prefix: /usr

For now I can make the flex version match in common/packages.yaml in my environment area. But does that mean that we need to change something in this PR?

You should submit a site config update as a follow-up PR, yes. I expect we need a few more of those.

srherbener commented 13 hours ago

Turns out that /usr/bin/flex on Discover is already version 2.6.4, so I updated the site/packages.yaml file (to version 2.6.4) and concretize successfully finished. The install is running now on Discover (and the one on S4 is still running).

Do we still want to handle this update in a subsequent PR, or change this one. Here's the change I made in site/packages.yaml:

flex:
    # Must set buildable: false to avoid duplicate packages
    buildable: false
    externals:
    - spec: flex@2.6.4          <---- updated version
      prefix: /usr
climbfuji commented 13 hours ago

@AlexanderRichert-NOAA Would you mind changing the flex version in the discover site config, as noted by @srherbener, before merging?

srherbener commented 12 hours ago

I noticed that discover-scu17 already has this:

flex:
    # Must set buildable: false to avoid duplicate packages
    buildable: false
    externals:
    - spec: flex@2.6.4+lex
      prefix: /usr

Does the scu16 config also need the "+lex" suffix? I had started a feature branch, but I think it would be simpler to make the change in this PR and be done with it. Thanks!

climbfuji commented 12 hours ago

I don't know how it was built on scu16

srherbener commented 12 hours ago

Well with my testing we know that it works without the suffix, so we should probably leave it that way. I'll make a PR out of my feature branch and if @AlexanderRichert-NOAA decides to update this one, we can close the one I make. Thanks!

AlexanderRichert-NOAA commented 11 hours ago

I see flex@2.6.4 in both discover configs. Is there a further change that needs to get made?

climbfuji commented 11 hours ago

No, I merged the update from Steve in the meanwile. I also updated your branch from develop and enabled automerge - just waiting for CI to run.

AlexanderRichert-NOAA commented 10 hours ago

:cowboy_hat_face: