UCL-ARC / hpc-spack

Solutions - HPC's Spack config
MIT License
1 stars 2 forks source link

[Owain testing] Standing up the stack on Kathleen #61

Open owainkenwayucl opened 2 weeks ago

owainkenwayucl commented 2 weeks ago

As a result of trying to help with CMake in #58 I've been trying to stand everything up on Kathleen, as ccspapp.

Because it started there, the site name is called "owain-issue58". Sorry.

This led to #60.

Now at the stage on Kathleen where most of the stuff in a base install installs (spack -e base install) but with some failures.

==> Error: beast2-2.7.4-rikpxdve6hqrxxlip26e3n2decji6eix: Package was not installed
==> Error: castep-23.1-ucwspmytbk3c4uu7qampn4ebpuhfdhpz: Package was not installed
==> Error: castep-24.1-lrbxb3yrppp5jonztcawb3mkydt2gt22: Package was not installed
==> Error: gromacs-2023.5-u62yewfylhpwfkxzr5mvnu5ck2l5fipu: Package was not installed
==> Error: gromacs-2024.3-ogai2ly4strsjcp4og4hqeu4n4baokvo: Package was not installed
==> Error: hdf5-1.14.3-bpqekariph2hh7ha3ygnjuelvtgrg37i: Package was not installed
==> Error: lammps-20240829-fhg2sunktv5m7qdgqx732yeew5kn4467: Package was not installed
==> Error: namd-2.14-lfzr5jvycp2ssvwbz7pxhmhqmabnkdmp: Package was not installed
==> Error: namd-3.0-lpx5kxpvuzcpkpr6chatj7byxdtcupxy: Package was not installed
==> Error: netcdf-c-4.9.2-pgnfkfvli7xnfdm37vzu5eqsmb3b5lkq: Package was not installed
==> Error: netcdf-fortran-4.6.1-n3yhrus2xxrohntuu6dcj7gntfgnk5s6: Package was not installed
==> Error: openmpi-4.1.6-xtzhq2emwlwbqflxuyhnhwomd44qxw3s: Package was not installed
==> Error: Installation request failed.  Refer to reported errors for failing package(s).

These failures seem to stem from various dependencies not being able to find libcrypt.so.2

==> Error: ProcessError: Command exited with status 127:
    './autogen.sh'

1 error found in build log:
     1    ==> numactl: Executing phase: 'autoreconf'
     2    ==> [2024-11-13-09:52:11.076603] './autogen.sh'
  >> 3    /lustre/shared/ucl/apps/spack/0.22/owain-issue58/spack/opt/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__
          /__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__/linux-centos7-x86_64_v3/gcc-10.2.
          1/perl-5.40.0-cznthvrecpjllyyx3dagchkcqpwooivw/bin/perl: error while loading shared libraries: libcrypt.so.2: cannot open shared object file: No 
          such file or directory

This issue is just to capture my notes as I work on this!

owainkenwayucl commented 2 weeks ago

That library isn't on Myriad in the usual locations so........

giordano commented 2 weeks ago
error while loading shared libraries: libcrypt.so.2: cannot open shared object file: No 
          such file or directory

libcrypt.so.2 should be coming from glibc 🤔 Spack relies on glibc coming from the system, bit worrying this library can't be found (which begs the question how it ended up being linked to that in the first place), unless on kathleen we have an old glibc which has the old libcrypt.so.1

owainkenwayucl commented 2 weeks ago

libcrypt.so.2 should be coming from glibc 🤔 Spack relies on glibc coming from the system, bit worrying this library can't be found (which begs the question how it ended up being linked to that in the first place), unless on kathleen we have an old glibc which has the old libcrypt.so.1

Myriad also only has libcrypt.so.1. This is pulling things from the general spack build cache which is presumably why it's linked against libcrypt.so.2

giordano commented 2 weeks ago

This is pulling things from the general spack build cache which is presumably why it's linked against libcrypt.so.2

Ah, then that's problematic 😕

haampie commented 2 weeks ago

libcrypt.so is generally not problematic whether glibc provides it or not, cause recipes should force it to come from depends_on("libxcrypt"). If not that's a packaging bug.

However, with Perl it's tricky because libxcrypt requires Perl and Perl requires libxcrypt. It's a known issue:

https://github.com/besser82/libxcrypt/issues/187

So in Spack libxcrypt is not a dependency of perl, but perl's configure script will enable it anyhow if it can find it as part of glibc; I don't think their configure scripts allows one to disable it explicitly.

owainkenwayucl commented 2 weeks ago

Thanks!

heatherkellyucl commented 2 weeks ago

Ah right, this is one where we need a perl built and added to our own build cache first to not hit this problem. (Which has been the case for all the previous work I was doing). Then it won't try to get it from manylinux.

heatherkellyucl commented 2 weeks ago

I was thinking if we added system perl as an external then this might be resolvable, but from that issue "But to configure libxcrypt, you need perl 5.14 with open.pm, which is not available on RHEL and derivatives by default, and you may not have access to yum install perl-open." so that probably isn't going to help us.

giordano commented 2 weeks ago

How about installing libxcrypt from RHEL repos and use that as an external? 😅 Still, avoiding externals would be better to keep things self-consistent

heatherkellyucl commented 2 weeks ago

Discussion in the Spack Slack about never reusing Perl from a buildcache but reusing everything else pointed at the examples in https://github.com/spack/spack/pull/42782 of excluding things from being reused by the concretizer, not sure if useful here.

The perl we get from first_compiler.yaml was like this when the binary cache was not added. The gcc 11.2.1 here is our starting external compiler.

==> Installed packages
-- linux-rhel7-cascadelake / gcc@11.2.1 -------------------------
perl@5.38.0+cpanm+opcode+open+shared+threads build_system=generic
    berkeley-db@18.1.40+cxx~docs+stl build_system=autotools patches=26090f4,b231fcc
    bzip2@1.0.8~debug~pic+shared build_system=generic
        diffutils@3.9 build_system=autotools
            libiconv@1.17 build_system=autotools libs=shared,static
    gcc-runtime@11.2.1 build_system=generic
    gdbm@1.23 build_system=autotools
        readline@8.2 build_system=autotools patches=bbf97f1
            ncurses@6.4~symlinks+termlib abi=none build_system=autotools patches=7a351bc
                pkgconf@1.9.5 build_system=autotools
    glibc@2.17 build_system=autotools patches=be65fec,e179c43
    gmake@4.4.1~guile build_system=generic
    zlib@1.3.1+optimize+pic+shared build_system=makefile
owainkenwayucl commented 2 weeks ago

(adding system perl only removed netcdf from the list of things that don't install)

owainkenwayucl commented 2 weeks ago

The perl from the firstcompiler run is in the build-cache but it doesn't help.

owainkenwayucl commented 2 weeks ago

We've tried various things.

So some things that definitely break things badly:

If you

  include_concrete:
  - /path/that/doesnt/exist

it will break spack env for all environments e.g. you can't spack env remove <other environmentname>

Tied in to that it doesn't expand environment variables.

heatherkellyucl commented 2 weeks ago

So having the binary cache available caused much chaos and several hours of going round in circles.

One of the environments Owain got ended up like this:

-- linux-rhel7-cascadelake / gcc@12.3.0 -------------------------
bcftools@1.16     bwa@0.7.17   gatk@4.4.0.0    gzip@1.13    htslib@1.17      namd@3.0              openmpi@4.1.6  samtools@1.17
beast2@2.7.4      castep@23.1  gromacs@2023.5  hdf5@1.14.3  lammps@20240829  netcdf-c@4.9.2        picard@2.26.2
bedtools2@2.31.0  castep@24.1  gromacs@2024.3  hdf5@1.14.3  namd@2.14        netcdf-fortran@4.6.1  python@3.11.6

==> Installed packages
-- linux-centos7-x86_64_v3 / gcc@10.2.1 -------------------------
autoconf@2.72        gdbm@1.23             libjpeg-turbo@3.0.3  nasm@2.16.03    pigz@2.8                     re2c@3.1
automake@1.16.5      gettext@0.22.5        libmd@1.0.4          ncurses@6.5     pkgconf@2.2.0                re2c@3.1
berkeley-db@18.1.40  gettext@0.22.5        libpciaccess@0.17    nghttp2@1.63.0  py-cython@3.0.11             readline@8.2
bison@3.8.2          git@2.47.0            libpng@1.6.39        ninja@1.12.1    py-flit-core@3.9.0           sqlite@3.46.0
bzip2@1.0.8          glibc@2.17            libsigsegv@2.14      ninja@1.12.1    py-packaging@24.1            sqlite@3.46.0
cmake@3.30.5         gmake@4.4.1           libtool@2.4.7        openssh@9.8p1   py-pip@23.1.2                tar@1.34
cmake@3.30.5         hwloc@2.11.1          libunistring@1.2     openssh@9.9p1   py-setuptools@69.2.0         tar@1.34
curl@8.10.1          krb5@1.21.3           libxcrypt@4.4.35     openssl@3.3.1   py-setuptools-scm@8.0.4      util-linux-uuid@2.40.2
curl@8.10.1          libbsd@0.12.2         libxcrypt@4.4.35     openssl@3.3.1   py-tomli@2.0.1               util-macros@1.20.1
diffutils@3.10       libedit@3.1-20240808  libxml2@2.10.3       openssl@3.4.0   py-typing-extensions@4.12.2  xz@5.4.6
expat@2.6.3          libevent@2.1.12       libxml2@2.13.4       pcre2@10.44     py-wheel@0.41.2              zlib-ng@2.2.1
findutils@4.9.0      libffi@3.4.6          lz4@1.10.0           perl@5.40.0     python@3.11.9                zlib-ng@2.2.1
freetype@2.13.2      libiconv@1.17         m4@1.4.19            perl@5.40.0     python@3.11.9                zstd@1.5.6
gcc-runtime@10.2.1   libidn2@2.3.7         meson@1.5.1          pigz@2.8        python-venv@1.0

-- linux-rhel7-cascadelake / gcc@11.2.1 -------------------------
diffutils@3.9    gcc-runtime@11.2.1  gmake@4.4.1    libsigsegv@2.14  m4@1.4.19
findutils@4.9.0  glibc@2.17          libiconv@1.17  libtool@2.4.7    pkgconf@1.9.5

-- linux-rhel7-cascadelake / gcc@12.3.0 -------------------------
apr@1.7.4         htslib@1.16           picard@2.26.2         py-kiwisolver@1.4.5     py-pyproject-hooks@1.0.0     samtools@1.17
apr-util@1.6.3    htslib@1.17           pmix@5.0.1            py-matplotlib@3.8.4     py-pyproject-metadata@0.7.1  scons@4.5.2
bcftools@1.16     javafx@20.0.1         popt@1.16             py-meson-python@0.15.0  py-pyproject-metadata@0.7.1  serf@1.3.10
bedtools2@2.31.0  libaec@1.0.6          py-build@1.0.3        py-meson-python@0.15.0  py-python-dateutil@2.8.2     snappy@1.1.10
bwa@0.7.17        libdeflate@1.18       py-certifi@2023.7.22  py-numpy@1.26.4         py-setuptools@69.2.0         subversion@1.14.2
c-blosc@1.21.5    libfabric@1.21.0      py-contourpy@1.0.7    py-numpy@1.26.4         py-six@1.16.0                tcl@8.6.12
eigen@3.4.0       meson@1.3.2           py-cppy@1.2.1         py-packaging@23.1       py-wheel@0.41.2              ucx@1.16.0
gatk@4.4.0.0      netcdf-c@4.9.2        py-cycler@0.11.0      py-pillow@10.3.0        python@3.11.6                utf8proc@2.8.0
gsl@2.7.1         netcdf-fortran@4.6.1  py-cython@0.29.36     py-pip@23.1.2           python-venv@1.0              voropp@0.4.6
gzip@1.13         openblas@0.3.26       py-flit-core@3.9.0    py-pybind11@2.12.0      qhull@2020.2                 xxhash@0.8.1
hdf5@1.14.3       openjdk@17.0.8.1_1    py-fonttools@4.39.4   py-pyparsing@3.1.2      rsync@3.2.7

-- linux-rhel7-x86_64_v3 / gcc@12.3.0 ---------------------------
gcc-runtime@12.3.0  glibc@2.17
==> 160 installed packages

The linux-centos7-x86_64_v3 / gcc@10.2.1 chunk all came from the binary cache, but we shouldn't be getting anything that depends on gcc 10, so we don't want it to do that (and it had the same perl issue anyway).

I do need to look at the target arch, since we aren't specifying anything for those right now and we've ended up with a variety which are likely not helping. linux-rhel7-x86_64_v3, linux-rhel7-cascadelake, linux-centos7-x86_64_v3. (I don't know if it is the gcc version or the cascadelake that made more of our linux-rhel7-cascadelake / gcc@11.2.1 set be not used).

I don't know what the minimum is that we need to install locally to still make things in the binary cache usable and perl not be broken. A libxcrypt at the same time as our first perl, with that chunk as x86_64_v3 and not allowing the binary cache?

giordano commented 2 weeks ago

@haampie would you be able to rebuild perl in the buildcache to use libxcrypt, which uses the first perl in the buildcache (basically do the bootstrap process)? Feels like the current situation isn't ideal :smiling_face_with_tear: