PawseySC / pawsey-spack-config

Automated deployment system for the scientific software stack in use at Pawsey
BSD 3-Clause "New" or "Revised" License
4 stars 9 forks source link

add hyperdrive recipe #280

Closed d3v-null closed 3 weeks ago

d3v-null commented 4 months ago

Here's a cpu-only hyperdrive recipe. I'd love some help converting this to use HIP. hyperdrive main (not 0.3.0) has just been fixed, and tests pas on bare metal hip 5.7.3

salloc --nodes=1 --partition=gpu-highmem --account=pawsey0875-gpu -t 08:00:00 --gres=gpu:8 --exclusive
git clone https://github.com/MWATelescope/mwa_hyperdrive.git $MYSOFTWARE/hyperdrive; cd $_
module use /software/setonix/2023.08/modules/zen3/gcc/12.2.0/dependencies/
module load .fontconfig/2.13.94-rykl4uf .libpng/1.6.37-5rosdzt
module use /software/setonix/unsupported
export ROCM_VER=5.7.3
module load rocm/${ROCM_VER}
HYPERBEAM_HIP_ARCH=gfx90a HIP_PATH=/software/setonix/rocm/${ROCM_VER}/ ROCM_PATH=/software/setonix/rocm/${ROCM_VER}/ cargo test --features=hip,cfitsio-static,hdf5-static

hyperbeam also passes the tests and can be hipified.

git clone https://github.com/MWATelescope/mwa_hyperbeam.git $MYSOFTWARE/hyperbeam; cd $_
module use /software/setonix/unsupported
export ROCM_VER=5.7.3
module load rocm/${ROCM_VER}
HYPERBEAM_HIP_ARCH=gfx90a HIP_PATH=/software/setonix/rocm/${ROCM_VER}/ ROCM_PATH=/software/setonix/rocm/${ROCM_VER}/ cargo test --features=hip,hdf5-static
d3v-null commented 4 months ago

Hey, I added hip to hyperbeam, but I'm encountering an issue.

module load spack/default
spack env activate --temp -p
spack add hyperbeam amdgpu_target=gfx90a +rocm +python
spack install
wget https://github.com/MWATelescope/mwa_hyperbeam/blob/main/examples/analytic_gpu.py
python analytic_gpu.py
Traceback (most recent call last):
  File "/software/projects/mwaeor/dev/hyperbeam/hyperbeam-git/examples/analytic_gpu.py", line 14, in <module>
    import mwa_hyperbeam
  File "/tmp/spack-0we9_87c/.spack-env/._view/fdhkusyqukq5b7oyhq6hdozyfrr35v3i/lib/python3.11/site-packages/mwa_hyperbeam/__init__.py", line 1, in <module>
    from .mwa_hyperbeam import *
ImportError: libnuma-eed8cdad.so.1: ELF load command address/offset not properly aligned

Also, not sure if I would need to add anything to this yaml file to force amdgpu_target=gfx90a https://github.com/PawseySC/pawsey-spack-config/blob/main/systems/setonix/environments/rocm/spack.yaml#L12C5-L12C49

dipietrantonio commented 3 months ago

Hi Dev,

I took a look and the problem seems to be with the generation of the .so file.

(within my local installation /software/projects/pawsey0001/cdipietrantonio/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/hyperbeam-0.9.2-qnwoto6ptovjai3ptplaugohvwhhy5jr/lib/python3.11/site-packages/mwa_hyperbeam)

cdipietrantonio@nid002944 $ ldd mwa_hyperbeam.cpython-311-x86_64-linux-gnu.so 
./mwa_hyperbeam.cpython-311-x86_64-linux-gnu.so: error while loading shared libraries: libnuma-eed8cdad.so.1: ELF load command address/offset not properly aligned

I think this is due to a compilation error of some sorts, and similar issues I found online point to outdated versions of binutils being used. I will keep looking into this.

pelahi commented 3 months ago

Hi all, I have some ideas as to what might happen for a spack installation where an external package is listed, as is the case with rocm@5.x.x where the external package installation process has likely installed a number of dependencies. These are not explicitly listed and known to spack. Spack then builds said dependency as it is a dependency for another package and then this conflicting spack built package is loaded, resulting in possible errors. This could be changed by providing them in spack's package.yaml. This might require diving in the external package using a recursive ldd. This is a quick brain dump and I need to have a look in more detail for this specific error.

dipietrantonio commented 3 months ago

Continuing my investigation, I found the dynamic library location. As you can see from the session, the libnuma-*.so file has been somehow corrupted during generation. The other libraries seem fine (as in I can execute ldd on them). I am not sure why these libraries are copied there. Is it a python installation thing?

In any case, the process is trashing libnuma for some reason. May be that the system-wide libnuma is too old, and when it is "copied" the process corrupts it?

cdipietrantonio@setonix-01:/software/projects/pawsey0001/cdipietrantonio/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/hyperbeam-0.9.2-qnwoto6ptovjai3ptplaugohvwhhy5jr/lib/python3.11/site-packages/mwa_hyperbeam.libs> ls -lah
total 159M
drwxr-sr-x 2 cdipietrantonio pawsey0001 4.0K Jul 15 09:21 .
drwxr-sr-x 5 cdipietrantonio pawsey0001 4.0K Jul 15 09:21 ..
-rwxr-xr-x 1 cdipietrantonio pawsey0001 132M Jul 15 09:21 libamd_comgr-59cb1038.so.2
-rwxr-xr-x 1 cdipietrantonio pawsey0001  23M Jul 15 09:21 libamdhip64-4fd00713.so.5
-rwxr-xr-x 1 cdipietrantonio pawsey0001  47K Jul 15 09:21 libdrm_amdgpu-7d57d271.so.1
-rwxr-xr-x 1 cdipietrantonio pawsey0001  87K Jul 15 09:21 libdrm-d8701a0c.so.2
-rwxr-xr-x 1 cdipietrantonio pawsey0001 104K Jul 15 09:21 libelf-4b8ce666.so.1
-rwxr-xr-x 1 cdipietrantonio pawsey0001 2.7M Jul 15 09:21 libhsa-runtime64-2bcf2738.so.1
-rwxr-xr-x 1 cdipietrantonio pawsey0001  52K Jul 15 09:21 libnuma-eed8cdad.so.1
-rwxr-xr-x 1 cdipietrantonio pawsey0001 197K Jul 15 09:21 libtinfo-d41ff06c.so.6
-rwxr-xr-x 1 cdipietrantonio pawsey0001  13K Jul 15 09:21 libxpmem-bf2afbfb.so.0
-rwxr-xr-x 1 cdipietrantonio pawsey0001 1.2M Jul 15 09:21 libzstd-1a7670e1.so.1
cdipietrantonio@setonix-01:/software/projects/pawsey0001/cdipietrantonio/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/hyperbeam-0.9.2-qnwoto6ptovjai3ptplaugohvwhhy5jr/lib/python3.11/site-packages/mwa_hyperbeam.libs> ldd libnuma-eed8cdad.so.1 
    not a dynamic executable
cdipietrantonio@setonix-01:/software/projects/pawsey0001/cdipietrantonio/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/hyperbeam-0.9.2-qnwoto6ptovjai3ptplaugohvwhhy5jr/lib/python3.11/site-packages/mwa_hyperbeam.libs> ls -lah
total 159M
drwxr-sr-x 2 cdipietrantonio pawsey0001 4.0K Jul 15 09:21 .
drwxr-sr-x 5 cdipietrantonio pawsey0001 4.0K Jul 15 09:21 ..
-rwxr-xr-x 1 cdipietrantonio pawsey0001 132M Jul 15 09:21 libamd_comgr-59cb1038.so.2
-rwxr-xr-x 1 cdipietrantonio pawsey0001  23M Jul 15 09:21 libamdhip64-4fd00713.so.5
-rwxr-xr-x 1 cdipietrantonio pawsey0001  47K Jul 15 09:21 libdrm_amdgpu-7d57d271.so.1
-rwxr-xr-x 1 cdipietrantonio pawsey0001  87K Jul 15 09:21 libdrm-d8701a0c.so.2
-rwxr-xr-x 1 cdipietrantonio pawsey0001 104K Jul 15 09:21 libelf-4b8ce666.so.1
-rwxr-xr-x 1 cdipietrantonio pawsey0001 2.7M Jul 15 09:21 libhsa-runtime64-2bcf2738.so.1
-rwxr-xr-x 1 cdipietrantonio pawsey0001  52K Jul 15 09:21 libnuma-eed8cdad.so.1
-rwxr-xr-x 1 cdipietrantonio pawsey0001 197K Jul 15 09:21 libtinfo-d41ff06c.so.6
-rwxr-xr-x 1 cdipietrantonio pawsey0001  13K Jul 15 09:21 libxpmem-bf2afbfb.so.0
cdipietrantonio@setonix-01:/software/projects/pawsey0001/cdipietrantonio/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/hyperbeam-0.9.2-qnwoto6ptovjai3ptplaugohvwhhy5jr/lib/python3.11/site-packages/mwa_hyperbeam.libs> ldd libamdhip64-4fd00713.so.5 
    linux-vdso.so.1 (0x000015525123a000)
    libdl.so.2 => /lib64/libdl.so.2 (0x000015524f731000)
    librt.so.1 => /lib64/librt.so.1 (0x000015524f528000)
    libamd_comgr-59cb1038.so.2 => not found
    libhsa-runtime64-2bcf2738.so.1 => not found
    libnuma-eed8cdad.so.1 => not found
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000015524f103000)
    libm.so.6 => /lib64/libm.so.6 (0x000015524edb8000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000015524eb99000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x000015524e976000)
    libc.so.6 => /lib64/libc.so.6 (0x000015524e581000)
    /lib64/ld-linux-x86-64.so.2 (0x0000155251012000)
dipietrantonio commented 3 months ago

More information found here: https://github.com/beeware/briefcase/pull/761/files

dipietrantonio commented 3 months ago

Tried to reproduce the error outside spack, with the command sequence below, but I came across a link stage error istead

 1005  git clone https://github.com/MWATelescope/mwa_hyperbeam.git
 1007  cd mwa_hyperbeam/
 1009  git checkout v0.9.2 
 1010  module load rocm/5.7.3
 1011  module load cfitsio/3.49-kjcswew rust/1.78.0 
 1012  module load python/3.11.6 py-pip/23.1.2-py3.11.6 
 1018  export LD_LIBRARY_PATH=/software/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/erfa-2.0.0-t3xwem2wfrqln3l6wln2dhjn2yhcykp3/lib:$LD_LIBRARY_PATH
 1019  export CPATH=/software/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/erfa-2.0.0-t3xwem2wfrqln3l6wln2dhjn2yhcykp3/include:$CPATH
 1021  export CARGO_HOME=$MYSCRATCH/.cargo
 1022  export HYPERBEAM_HIP_ARCH=gfx90a
 1023  export HIP_PATH=$ROCM_PATH
 1024  cargo build --locked --release --features=hip,hdf5-static
 1041  cd /software/projects/pawsey0001/cdipietrantonio/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/py-maturin-1.1.0-vhyhnuk42wfx4h5wyk3l3ckvdthecot6
 1044  cd ../../
 1045  cd py-maturin-1.1.0-mtv5birsrdlib7jqlkdyozhoffbozyfs/
 1046  ls
 1047  cd bin/
 1048  ls
 1049  ./maturin 
 1050  export PATH=`pwd`:$PATH
 1051  cd /software/projects/pawsey0001/cdipietrantonio/build-hyperbeam/mwa_hyperbeam
 1053  maturin build --release --features=python,hdf5-static,hip --strip

Error

cdipietrantonio@setonix-01:/software/projects/pawsey0001/cdipietrantonio/build-hyperbeam/mwa_hyperbeam> maturin build --release --features=python,hdf5-static,hip --strip
📦 Including license file "/software/projects/pawsey0001/cdipietrantonio/build-hyperbeam/mwa_hyperbeam/LICENSE"
🔗 Found pyo3 bindings
🐍 Found CPython 3.11 at /software/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/python-3.11.6-4ysxrvuaor6iljintmzcazlkfcokwnes/bin/python3
warning: mwa_hyperbeam@0.9.2: HIP_PATH set from env /software/setonix/rocm/5.7.3
   Compiling mwa_hyperbeam v0.9.2 (/software/projects/pawsey0001/cdipietrantonio/build-hyperbeam/mwa_hyperbeam)
error: linking with `cc` failed: signal: 11 (SIGSEGV)
  |
  = note: LC_ALL="C" PATH="/software/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/rust-1.78.0/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin:/software/projects/pawsey0001/cdipietrantonio/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/py-maturin-1.1.0-mtv5birsrdlib7jqlkdyozhoffbozyfs/bin:/software/projects/pawsey0001/cdipietrantonio/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/py-maturin-1.1.0-vhyhnuk42wfx4h5wyk3l3ckvdthecot6/bin:/software/setonix/2024.05/spack/bin:/software/projects/pawsey0001/cdipietrantonio/setonix/2024.05/python/bin:/software/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/python-3.11.6-4ysxrvuaor6iljintmzcazlkfcokwnes/bin:/software/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/py-pip-23.1.2-dwy4vuci5j63diu3hmd2ykzy5soxmxal/bin:/software/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/rust-1.78.0/toolchains/stable-x86_64-unknown-linux-gnu/bin:/software/setonix/rocm/5.7.3/bin:/software/setonix/rocm/5.7.3/rocprofiler/bin:/software/setonix/rocm/5.7.3/hip/bin:/home/cdipietrantonio/bin:/software/projects/pawsey0001/cdipietrantonio/development/utils/apps:/software/pawsey/tools/pawseytools/bin:/opt/cray/pe/mpich/8.1.27/ofi/gnu/9.1/bin:/opt/cray/pe/mpich/8.1.27/bin:/opt/cray/pe/craype/2.7.23/bin:/opt/cray/pe/gcc/12.2.0/snos/bin:/opt/cray/pe/perftools/23.09.0/bin:/opt/cray/pe/papi/7.0.1.1/bin:/opt/cray/libfabric/1
d3v-null commented 3 months ago

Those commands look correct, although spack-installed maturin py-maturin-1.1.0-vhyhnuk42wfx4h5wyk3l3ckvdthecot6 may be built for a different python than python/3.11.6

d3v-null commented 3 months ago

changing to a different linker worked for me.

salloc --nodes=1 --partition=gpu-highmem --account=pawsey0875-gpu -t 08:00:00 --gres=gpu:8 --exclusive
git clone https://github.com/MWATelescope/mwa_hyperbeam.git $MYSOFTWARE/hyperbeam; cd $_
module load spack/default rocm/5.7.3 rust
spack env activate --temp -p
spack add py-pip
spack install
export CARGO_HOME="$MYSCRATCH/.cargo" HIP_PATH="$ROCM_PATH" HYPERBEAM_HIP_ARCH="gfx90a" 
# this next line is the important bit.
export RUSTFLAGS="-Clinker=/opt/cray/pe/gcc/12.2.0/snos/bin/x86_64-suse-linux-gcc"
pip install maturin[patchelf]
maturin build --release --features=python,hdf5-static,hip --strip
pip install /software/projects/mwaeor/dev/hyperbeam/hyperbeam-git/target/wheels/mwa_hyperbeam-0.9.2-cp311-cp311-manylinux_2_31_x86_64.whl
wget https://raw.githubusercontent.com/MWATelescope/mwa_hyperbeam/main/examples/analytic_gpu.py
python analytic_gpu.py
dipietrantonio commented 3 months ago

I had already noticed a difference In your command sequence, but the error I now get confirms it (this is without Spack).

   Compiling mwa_hyperbeam v0.9.2 (/software/projects/pawsey0001/cdipietrantonio/build-hyperbeam/mwa_hyperbeam)
warning: mwa_hyperbeam@0.9.2: HIP_PATH set from env /software/setonix/rocm/5.7.3
    Finished `release` profile [optimized] target(s) in 43.63s
💥 maturin failed
  Caused by: Failed to execute 'patchelf', did you install it? Hint: Try `pip install maturin[patchelf]` (or just `pip install patchelf`)
  Caused by: No such file or directory (os error 2)

The Spack installed py-maturin does not come with the patchelf option and this may cause the library corruption I previously identified. I believe that if I now install py-maturin with that requirement (I did not see that requirement expressed previously for other packages I believe), the installation will work.

dipietrantonio commented 3 months ago

Here is the change I made to Maturin's recipe

diff --git a/repo/packages/py-maturin/package.py b/repo/packages/py-maturin/package.py
index 8e0e6af..2de662b 100644
--- a/repo/packages/py-maturin/package.py
+++ b/repo/packages/py-maturin/package.py
@@ -25,6 +25,7 @@ class PyMaturin(PythonPackage):
     depends_on("py-setuptools-rust@1.4:", type="build")
     depends_on("py-tomli@1.1:", when="^python@:3.10", type=("build", "run"))
     depends_on("rust", type=("build", "run"))
+    depends_on("patchelf", type=("build", "run"))

     def setup_build_environment(self, env):
         env.set("CARGO_HOME", f"{self.stage.source_path}/.cargo")

This is what I get now

warning: mwa_hyperbeam@0.9.2: HIP_PATH set from env /software/setonix/rocm/5.7.3
   Compiling numpy v0.19.0
   Compiling hdf5 v0.8.1
   Compiling marlu v0.11.0
    Finished `release` profile [optimized] target(s) in 15.62s
🖨  Copied external shared libraries to package mwa_hyperbeam.libs directory:
    /software/setonix/rocm/5.7.3/lib/libhsa-runtime64.so.1.11.50703
    /software/setonix/rocm/5.7.3/lib/libamdhip64.so.5.7.50703
    /usr/lib64/libnuma.so.1.0.0
    /opt/cray/xpmem/2.5.2-2.4_3.47__gd0f7936.shasta/lib64/libxpmem.so.0.0.0
    /usr/lib64/libelf-0.185.so
    /software/setonix/rocm/5.7.3/lib/libamd_comgr.so.2.5.50703
    /usr/lib64/libdrm_amdgpu.so.1.0.0
    /lib64/libtinfo.so.6.1
    /usr/lib64/libdrm.so.2.4.0
    /usr/lib64/libzstd.so.1.5.0
📦 Built wheel for CPython 3.11 to /scratch/pawsey1045/cdipietrantonio/setonix/2024.05/software/cdipietrantonio/build_stage/spack-stage-hyperbeam-0.9.2-hsaecyxatcz64tteyfy5gqzbgwkqzrdu/spack-src/target/wheels/mwa_hyperbeam-0.9.2-cp311-cp311-manylinux_2_31_x86_64.whl
==> [2024-07-30-13:51:36.481050] '/software/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/py-pip-23.1.2-dwy4vuci5j63diu3hmd2ykzy5soxmxal/bin/pip3' 'install' '--prefix=/software/projects/pawsey1045/cdipietrantonio/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/hyperbeam-0.9.2-hsaecyxatcz64tteyfy5gqzbgwkqzrdu' 'target/wheels/mwa_hyperbeam-0.9.2-cp311-cp311-manylinux_2_31_x86_64.whl'
Processing ./target/wheels/mwa_hyperbeam-0.9.2-cp311-cp311-manylinux_2_31_x86_64.whl
Requirement already satisfied: numpy in /software/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/py-numpy-1.26.1-bbgtuwupkcessgguivyrtykcfwo3a5y6/lib/python3.11/site-packages (from mwa-hyperbeam==0.9.2) (1.26.1)
Installing collected packages: mwa-hyperbeam
Successfully installed mwa-hyperbeam-0.9.2

But I still get the same error with the dynamic library..

dipietrantonio commented 3 months ago

Setting a specific version of patchelf (the same installed by pip), I made it work. In your pull request, update the py-maturin recipe as follows.

diff --git a/repo/packages/py-maturin/package.py b/repo/packages/py-maturin/package.py
index 8e0e6af..c80cc1f 100644
--- a/repo/packages/py-maturin/package.py
+++ b/repo/packages/py-maturin/package.py
@@ -25,6 +25,7 @@ class PyMaturin(PythonPackage):
     depends_on("py-setuptools-rust@1.4:", type="build")
     depends_on("py-tomli@1.1:", when="^python@:3.10", type=("build", "run"))
     depends_on("rust", type=("build", "run"))
+    depends_on("patchelf@0.17.2", type=("build", "run"))

     def setup_build_environment(self, env):
         env.set("CARGO_HOME", f"{self.stage.source_path}/.cargo")

Here is the test. Unfortunately, for Spack installed modules with Python dependencies, one needs to load dependencies manually. This is a result of the current configuration of Spack that does not generate load commands in module files because software is built with RPATHs. Python software however does not work that way. We will fix this asap.

cdipietrantonio@nid002944:/scratch/pawsey0001/cdipietrantonio> module load hyperbeam/0.9.2-xjyeqlb
cdipietrantonio@nid002944:/scratch/pawsey0001/cdipietrantonio> wget https://raw.githubusercontent.com/MWATelescope/mwa_hyperbeam/main/examples/analytic_gpu.py
[...]
Saving to: ‘analytic_gpu.py’

cdipietrantonio@nid002944:/scratch/pawsey0001/cdipietrantonio> module load python/3.11.6
cdipietrantonio@nid002944:/scratch/pawsey0001/cdipietrantonio> module load py-numpy/1.26.1
cdipietrantonio@nid002944:/scratch/pawsey0001/cdipietrantonio> python3 analytic_gpu.py 
Time to calculate 1000000 directions: 0.435s
First Jones matrix:
[ 0.        +0.j  0.906947  +0.j  0.90241604+0.j -0.        +0.j]
dtype of the array: complex128
dipietrantonio commented 2 months ago

Hi Dev,

can you confirm that now these changes meet your requirements? I think the only missing piece is the beam files.. Is the PR ready for review?

d3v-null commented 2 months ago

Ready to go!

d3v-null commented 2 months ago

Thanks @d3v-null and @dipietrantonio for all this work. I'm happy to approve. The only thing I would like is perhaps some comment in hyperdrive and the mwalib on the tests that are run. For instance the mwalib executes a test but can we describe what it is doing and just a little bit of context of what could happen for a failure?

No worries, is it ok if this lives in the source code file for each example? e.g. https://github.com/MWATelescope/mwalib/blob/91700684e631e7bdbe93d94778871747cb008516/examples/mwalib-print-context.c#L2

dipietrantonio commented 2 months ago

I agree with @d3v-null, the best place to describe the tests is in the test suite itself.

I am running a full installation of the MWA packages from scratch to confirm everything works. So I can use the same process to install them systemwide in September.

dipietrantonio commented 2 months ago

@d3v-null The hyperdrive recipe lacks a depends_on("rust@1.64.0:")

==> Error: ProcessError: cargo: No such file or directory
    Command: 'cargo' 'install' '--path=.' '--root=/software/projects/pawsey0001/cdipietrantonio/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/hyperdrive-0.4.1-oa7t2qlwrrj6u7p3drpfxzzy7idaa24y' '--no-default-features' '--features=cfitsio-static,hdf5-static,hip,plotting' '--locked'
See build log for details:
  /scratch/pawsey0001/cdipietrantonio/setonix/2024.05/software/cdipietrantonio/build_stage/spack-stage-hyperdrive-0.4.1-oa7t2qlwrrj6u7p3drpfxzzy7idaa24y/spack-build-out.txt
dipietrantonio commented 2 months ago

Hyperbeam fails with

     217    
     218      --- stderr
     219      thread 'main' panicked at /scratch/pawsey0001/cdipietrantonio/set
            onix/2024.05/software/cdipietrantonio/build_stage/spack-stage-hyper
            beam-0.9.3-3yuxoncqpxlhwdcsw3otm5u2qg6lea32/spack-src/.cargo/regist
            ry/src/index.crates.io-6f17d22bba15001f/cmake-0.1.50/src/lib.rs:109
            8:5:
     220    
  >> 221    failed to execute command: No such file or directory (os error 2)
     222      is `cmake` not installed?

Probably cmake has always been a dependency but was present system-wide before the HPCM installation on Setonix, now it is not.

d3v-null commented 2 months ago

@d3v-null The hyperdrive recipe lacks a depends_on("rust@1.64.0:")

==> Error: ProcessError: cargo: No such file or directory
    Command: 'cargo' 'install' '--path=.' '--root=/software/projects/pawsey0001/cdipietrantonio/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/hyperdrive-0.4.1-oa7t2qlwrrj6u7p3drpfxzzy7idaa24y' '--no-default-features' '--features=cfitsio-static,hdf5-static,hip,plotting' '--locked'
See build log for details:
  /scratch/pawsey0001/cdipietrantonio/setonix/2024.05/software/cdipietrantonio/build_stage/spack-stage-hyperdrive-0.4.1-oa7t2qlwrrj6u7p3drpfxzzy7idaa24y/spack-build-out.txt

wow nice catch, that was because I edited the file in my browser because I was having some issue with Setonix. https://github.com/PawseySC/pawsey-spack-config/pull/280/commits/cddfd9e0c9b0e40f3ef92e94d5c246c6b3dcdcfc

dipietrantonio commented 2 months ago

Birli, as we already know, fails with the error below. Cause yet to be identified.

     269       Compiling birli v0.12.0 (/scratch/pawsey0001/cdipietrantonio/set
            onix/2024.05/software/cdipietrantonio/build_stage/spack-stage-birli
            -0.12.0-3jpcedubfxcqwersliirzerlxpqniart/spack-src)
     270       Compiling fitsio v0.20.0
  >> 271    error: could not find native static library `cfitsio`, perhaps an -
            L flag is missing?
     272    
dipietrantonio commented 2 months ago

The CFITSIO link error is due to missing libc.a static library on the system (after the HPCM installation).

cdipietrantonio@setonix-02:/scratch/director2183/cdipietrantonio> gcc test.o -o test -static 
/usr/bin/ld: cannot find -lc: No such file or directory
collect2: error: ld returned 1 exit status
dipietrantonio commented 2 months ago

@d3v-null as discussed in chat, please revert the last commit and create a PR to the spack/spack repo instead.

As for the problems with birli, and mwalib, apply the patch below. So many weird errors happening, I believe that the HPCM installation has corrupted the RUST installation on Setonix, but I need to dig further. In the meantime, this patch will make the software work.

Also, consider removing the debug statements such as os.system("env").

diff --git a/repo/packages/birli/package.py b/repo/packages/birli/package.py
index 261c0a8..ae3c5af 100644
--- a/repo/packages/birli/package.py
+++ b/repo/packages/birli/package.py
@@ -28,6 +28,7 @@ class Birli(Package):
     def setup_build_environment(self, env):
         build_dir = self.stage.source_path
         env.set('CARGO_HOME', f"{build_dir}/.cargo")
+        env.set('RUSTFLAGS', f'-L{self.spec["cfitsio"].prefix.lib64} -L/usr/lib64 -lcurl -lbz2')

     def get_features(self):
         features = ["cfitsio-static"]
diff --git a/repo/packages/mwalib/package.py b/repo/packages/mwalib/package.py
index 18615f6..5ea6e86 100644
--- a/repo/packages/mwalib/package.py
+++ b/repo/packages/mwalib/package.py
@@ -50,6 +50,7 @@ class Mwalib(Package):
         build_dir = self.stage.source_path
         env.set('CARGO_HOME', f"{build_dir}/.cargo")
         env.set('RUST_BACKTRACE', 1)
+        env.set('RUSTFLAGS', f'-L{self.spec["cfitsio"].prefix.lib64}')

     def install(self, spec, prefix):
         os.system("env")
dipietrantonio commented 1 month ago

The latest changes resulted in a successful installation of all packages. Once @dev-null confirms everything works as expected, I will merge this branch.

dipietrantonio commented 1 month ago

@d3v-null patchelf is a dependency of py-maturin not hyperdrive. Please see https://github.com/PawseySC/pawsey-spack-config/pull/280#issuecomment-2258095785

dipietrantonio commented 1 month ago

I confirm now all packages install successfully.

dipietrantonio commented 3 weeks ago

We have tested the installations at the Radio School and they worked nicely.