JCSDA / spack-stack

Creative Commons Zero v1.0 Universal
27 stars 45 forks source link

Incompatible Package Error when executing setup-meta-modules #1360

Open rjdave opened 1 day ago

rjdave commented 1 day ago

Describe the bug Building the devel branch of the jedi-ufs-env stack and nco (as of 10/28/2024) with Intel results in a "Package with matching name is incompatible" error when it tries to create the meta modules. This bug is very similar to #1126.

To Reproduce Note: I have forced the use of Open MPI 4.1.6 by requesting it in site/packages.yaml to facilitate using TotalView (the 5.x line has removed the MPIR process acquisition interface required by TotalView).

Ask spack to build jedi-ufs-env and nco either with the command spack add jedi-ufs-env%intel nco%intel or by editing your spack.yaml. Conrcretize has opempi listed under both jedi-ufs-env and nco but only one openmpi is built. I can provide the concretize output if needed. The stack builds to completion and spack module lmod refresh is successful (and creates only one openmpi module), but spack stack setup-meta-modules errors:

Configuring basic directory information ...
  ... script directory: /opt/src/spack-stack/spack-ext/lib/jcsda-emc/spack-stack/stack
  ... base directory: /opt/src/spack-stack/spack-ext/lib/jcsda-emc/spack-stack
  ... spack directory: /opt/src/spack-stack/spack
Configuring active spack environment ...
  ... environment directory: /opt/src/spack-stack/envs/devel2024-10-28_intel
Parsing spack environment main config ...
  ... install directory: /opt/spack-stack
Parsing spack environment modules config ...
  ... configured to use lmod modules
  ... module directory: /opt/spack-stack/modulefiles
Parsing spack environment package config ...
  ... list of possible compilers: '['intel@2021.8.0', 'gcc', 'clang', 'oneapi', 'xl', 'nag', 'fj', 'aocc']'
  ... list of possible mpi providers: '['openmpi@4.1.6', 'openmpi', 'mpich']'
['intel', 'Core', 'openmpi', 'module-index.yaml']
 ... stack compilers: '{'intel': ['2021.8.0']}'
 ... stack mpi providers: '{'openmpi': {'4.1.6-26xih74': {'intel': ['2021.8.0']}}}'
  ... core compilers: ['gcc@=11.4.1']
Preparing meta module directory ...
  ... meta module directory : /opt/spack-stack/modulefiles/Core
  ... preferred compiler: intel
Creating compiler modules ...
  ... ... appending /opt/spack-stack/modulefiles/intel/2021.8.0 to MODULEPATHS_SAVE
  ... configuring stack compiler intel@2021.8.0
  ... ... CC  : /opt/sw/apps/intel/oneapi/compiler/2023.0.0/linux/bin/intel64/icc
  ... ... CXX : /opt/sw/apps/intel/oneapi/compiler/2023.0.0/linux/bin/intel64/icpc
  ... ... F77 : /opt/sw/apps/intel/oneapi/compiler/2023.0.0/linux/bin/intel64/ifort
  ... ... FC' : /opt/sw/apps/intel/oneapi/compiler/2023.0.0/linux/bin/intel64/ifort
  ... ... COMPFLAGS: setenv("CFLAGS", "-diag-disable=10441")
setenv("CXXFLAGS", "-diag-disable=10441")
setenv("FFLAGS", "-diag-disable=10441")
  ... ... MODULELOADS: load("intel/2021.8.0")
  ... ... MODULEPREREQS: prereq("intel/2021.8.0")
  ... ... ENVVARS  : prepend_path("LD_LIBRARY_PATH", "/opt/sw/apps/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin")
  ... ... MODULEPATHS  : prepend_path("MODULEPATH", "/opt/spack-stack/modulefiles/intel/2021.8.0")
  ... writing /opt/spack-stack/modulefiles/Core/stack-intel/2021.8.0.lua
==> Error: Package with matching name is incompatible: ordereddict([('version', ['4.1.6']), ('require', '~internal-hwloc+two_level_namespace')])

The stack is still usable, but no stack-openmpi or stack-python modules are created. I suspect this is actually the reason why I got this same error shown in #1126 when I moved aside one of the openmpi builds. The stack in that issue was also jedi-ufs-env and nco.

Expected behavior spack stack setup-meta-modules should complete without errors.

System: Linux Workstation running Rocky 9. Intel is 2023.0.0 (2021.8.0 legacy compilers (icc,icpc,ifort)) and GCC is 11.4.1 system package manager install.

Additional context I suspect I might be able to work around this by simply adding a depends_on("nco", type="run") to the spack-ext/repos/spack-stack/packages/jedi-ufs-env/package.py file but I have not tested this yet. A quick look at spack-ext/lib/jcsda-emc/spack-stack/stack/meta_modules.py makes me think that any time an MPI package is referenced more than once, this error will occur even if they are, in fact, the exact same package.

climbfuji commented 17 hours ago

Can you add ~internal-hwloc+two_level_namespace to the external openmpi package that you used?

rjdave commented 17 hours ago

I'm not using an external openmpi; it is being built by spack-stack. Both instances in concretize (under jedi-ufs-env and nco) have the exact same specs (including hash) and only one openmpi is actually built.

climbfuji commented 16 hours ago

The error message above is suggesting that the specs don't match. What does spack spec openmpi tell you in terms of variants being selected?

rjdave commented 16 hours ago
> spack spec openmpi
Input spec
--------------------------------
 -   openmpi

Concretized
--------------------------------
[+]  openmpi@4.1.6%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' ~atomics~cuda~cxx~cxx_exceptions~gpfs~internal-hwloc~internal-libevent~internal-pmix~java~legacylaunchers~lustre~memchecker~openshmem~orterunprefix~romio+rsh~singularity~static+two_level_namespace+vt+wrapper-rpath build_system=autotools fabrics=none romio-filesystem=none schedulers=none arch=linux-rocky9-sandybridge
[e]      ^glibc@2.34%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441'  build_system=autotools arch=linux-rocky9-sandybridge
[e]      ^gmake@4.3%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' ~guile build_system=generic patches=599f134 arch=linux-rocky9-sandybridge
[+]      ^hwloc@2.9.1%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' ~cairo~cuda~gl~libudev+libxml2~netloc~nvml~oneapi-level-zero~opencl+pci~rocm build_system=autotools libs=shared,static arch=linux-rocky9-sandybridge
[+]          ^libpciaccess@0.17%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441'  build_system=autotools arch=linux-rocky9-sandybridge
[+]              ^util-macros@1.19.3%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441'  build_system=autotools arch=linux-rocky9-sandybridge
[+]          ^libxml2@2.10.3%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' +pic~python+shared build_system=autotools arch=linux-rocky9-sandybridge
[e]              ^xz@5.2.5%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' ~pic build_system=autotools libs=shared,static arch=linux-rocky9-sandybridge
[+]          ^ncurses@6.5%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' ~symlinks+termlib abi=none build_system=autotools patches=7a351bc arch=linux-rocky9-sandybridge
[+]      ^libevent@2.1.12%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' +openssl build_system=autotools arch=linux-rocky9-sandybridge
[e]          ^gmake@4.3%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' ~guile build_system=generic patches=599f134 arch=linux-rocky9-sandybridge
[+]          ^openssl@3.3.0%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' ~docs+shared build_system=generic certs=mozilla arch=linux-rocky9-sandybridge
[+]              ^ca-certificates-mozilla@2023-05-30%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441'  build_system=generic arch=linux-rocky9-sandybridge
[e]              ^perl@5.32.1%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' ~cpanm~opcode+open+shared+threads build_system=generic arch=linux-rocky9-sandybridge
[+]      ^numactl@2.0.14%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441'  build_system=autotools patches=4e1d78c,62fc8a8,ff37630 arch=linux-rocky9-sandybridge
[e]          ^autoconf@2.69%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441'  build_system=autotools patches=7793209 arch=linux-rocky9-sandybridge
[e]          ^automake@1.16.2%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441'  build_system=autotools arch=linux-rocky9-sandybridge
[e]          ^libtool@2.4.6%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441'  build_system=autotools arch=linux-rocky9-sandybridge
[e]          ^m4@1.4.19%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' +sigsegv build_system=autotools patches=9dc5fbd,bfdffa7 arch=linux-rocky9-sandybridge
[e]      ^openssh@8.7p1%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' +gssapi build_system=autotools arch=linux-rocky9-sandybridge
[e]      ^perl@5.32.1%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' ~cpanm~opcode+open+shared+threads build_system=generic arch=linux-rocky9-sandybridge
[+]      ^pkg-config@0.29.2%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' +internal_glib build_system=autotools arch=linux-rocky9-sandybridge
[+]      ^pmix@5.0.1%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' ~docs+pmi_backwards_compatibility~python~restful build_system=autotools arch=linux-rocky9-sandybridge
[+]      ^zlib-ng@2.1.6%intel@2021.8.0 cflags='-diag-disable=10441' cxxflags='-diag-disable=10441' fflags='-diag-disable=10441' +compat+new_strategies+opt+pic+shared build_system=autotools arch=linux-rocky9-sandybridge
climbfuji commented 16 hours ago

I am not positive that this is the problem, but I see that the core compiler above is listed as gcc@11 - why? I think we hardcode the core compiler to be 4.6 because we don't want any of the compilers we use to be considered as the "core" compiler (because spack doesn't generate certain modules for that compiler).

rjdave commented 16 hours ago

I didn't do anything different than in past stack builds, but I can tell you the gcc compiler was used in this intel stack to create the packages gcc-runtime and boost:

ls spack-stack/gcc/11.4.1/
boost-1.84.0-iaytf7h/       gcc-runtime-11.4.1-tj6vdlm/

In looking at the old ticket referenced above, core compiler was indeed set to 4.6.

climbfuji commented 16 hours ago

Wondering then were this comes from (from your output above)

['intel', 'Core', 'openmpi', 'module-index.yaml']
 ... stack compilers: '{'intel': ['2021.8.0']}'
 ... stack mpi providers: '{'openmpi': {'4.1.6-26xih74': {'intel': ['2021.8.0']}}}'
  ... core compilers: ['gcc@=11.4.1']
rjdave commented 15 hours ago

I'm not sure because /path/to/env/common/modules.yaml lists 4.6:

      core_compilers::
      - gcc@4.6

But then it is overridden in /path/to/env/spack.yaml

      lmod:
        core_compilers:
        - gcc@=11.4.1

That didn't happen in previous builds. I checked old environments and there is no mention of core_compilers in spack.yaml. I followed the same recipe notes for this latest version except that I had to manually edit /path/to/env/site/modules.yaml to switch it from tcl to lmod. In the past, I used the --modulesys flag when creating the environment but that option has been removed. Maybe a change in spack or whatever generates or modifies spack.yaml is to blame.

However, I got this same error with a stack that had core_compilers set as expected (after moving aside one of the openmpi packages in #1126)

climbfuji commented 15 hours ago

Is this on a machine that I can access? I am having a hard time to see what is wrong.

rjdave commented 15 hours ago

unfortunately not, but It's not a special machine and I can send you my exact procedure through email if that will help. Also, this is not a tagged release of spack-stack, it is the develop branch as of 10/28/24, which appears to have not changed since the 21st.

I'm guessing it's one of the spack commands I use during my setup because I just initialized a bogus new environment and core_compilers is not yet in my spack.yaml file.