eth-cscs / stackinator

https://eth-cscs.github.io/stackinator/
BSD 3-Clause "New" or "Revised" License
19 stars 15 forks source link

add unshare workflow proto #118

Closed j-ogas closed 1 year ago

j-ogas commented 1 year ago

This PR adds an unshare(1) workflow prototype. It creates a writable root "sandbox" at /tmp/$USER.newroot and uses unshare(1) to manipulate user and mount namespaces. It is my hope that you folks will adopt and support this.

This is useful for those whom cannot use bubblewrap or setuid helpers. FWIW, you can create and mount squashfs files with a setuid-stripped fusermount3 binary using nested namespaces (see: https://github.com/hpc/charliecloud/pull/1415).

There are improvements to be had. For example, I have not tried concurrent builds of different environments. I suspect that won't work, out of the box, with what I've provided here.

$ ./bin/stack-config -s /foo/bar/unshare/system-config/cagnusmarlson -r /foo/bar/unshare/recipe -b /dev/shm/$USER.build --unshare
Stackinator
  recipe path:  /foo/bar/unshare/recipe
  build path :  /dev/shm/jogas.build
  system     :  /foo/bar/unshare/system-config/cagnusmarlson
  mount      :  default
  build cache:  None
  sandbox tool: unshare
spack: checkout branch/commit internal-0.20.0

Configuration finished, run the following to build the environment:

cd /dev/shm/jogas.build
unshare -U -m -r ./newroot.sh env --ignore-environment PATH=/usr/bin:/bin:`pwd`/spack/bin make store.squashfs -j32
$ cd /dev/shm/jogas.build
$ unshare -U -m -r ./newroot.sh env --ignore-environment PATH=/usr/bin:/bin:`pwd`/spack/bin make store.squashfs -j32
creating: newroot: /tmp/jogas.newroot
/tmp/jogas.newroot
[...]
/tmp/jogas.newroot/apps
/tmp/jogas.newroot/data
[...]
/tmp/jogas.newroot/var
/tmp/jogas.newroot/usr
/tmp/jogas.newroot/tmp
/tmp/jogas.newroot/sys
/tmp/jogas.newroot/srv
/tmp/jogas.newroot/selinux
/tmp/jogas.newroot/sbin
/tmp/jogas.newroot/run
/tmp/jogas.newroot/root
/tmp/jogas.newroot/proc
[...]
/tmp/jogas.newroot/net
/tmp/jogas.newroot/mnt
/tmp/jogas.newroot/lib64
/tmp/jogas.newroot/lib
[...]
/tmp/jogas.newroot/home
/tmp/jogas.newroot/etc
/tmp/jogas.newroot/dev
[...]
/tmp/jogas.newroot/boot
/tmp/jogas.newroot/bin
/tmp/jogas.newroot/.profile
/tmp/jogas.newroot/.wificonfig
spack arch... linux-firehose
spack version... 0.21.0.dev0 (2c07392e3087a19735c73f6ee6e9a185e8fd0464)
checking if spack concretizer works... ==> Warning: Failed to initialize repository: '/fun-stack/repo'.
  No repo.yaml found in '/fun-stack/repo'
  To remove the bad repository, run this command:
      spack repo rm /fun-stack/repo
yup
touch mirror-setup
make -C compilers
make[1]: Entering directory '/dev/shm/jogas.build/compilers'
[...]
mkdir -p /fun-stack
touch bootstrap/compilers.yaml
spack -e gcc/ concretize -f
[...]
 -   tivowgn  gcc@11.3.0%gcc@11.4.0~binutils+bootstrap~graphite~nvptx~piclibs~profiled+strip build_system=autotools build_type=Release languages=c,c++,fortran patches=cc6112d arch=linux-firehose
 -   apih3cj      ^diffutils@3.9%gcc@11.4.0 build_system=autotools arch=linux-firehose
 -   pze3z5p          ^libiconv@1.17%gcc@11.4.0 build_system=autotools libs=shared,static arch=linux-firehose
bcumming commented 1 year ago

Thanks for the detailed PR @j-ogas !

This is a really useful addition - the bwrap requirement has held back some potential users from testing the tool.

We will have look at the PR, and find a way to integrate it.

bcumming commented 1 year ago

I tried it out on a Cray EX system at CSCS:

stack-config --unshare -r ./unittests/recipes/host-recipe/ -b /dev/shm/bc/host -c ./cache.yaml -s ../alps-cluster-config/hohgant/

Followed by

unshare -U -m -r ./newroot.sh env --ignore-environment PATH=/usr/bin:/bin:`pwd`/spack/bin make store.squashfs -j32

and straight up I hit "permission denied" errors:

checking if spack concretizer works... ==> Warning: patchelf --force-rpath --set-rpath /dev/shm/bc/host/cache/bootstrap/store/linux-centos7-x86_64/gcc-10.2.1/patchelf-0.15.0-htk62k7efo2z22kh6kmhaselru7bfkuc/bin/patchelf failed witherror /dev/shm/bc/host/cache/bootstrap/store/linux-centos7-x86_64/gcc-10.2.1/patchelf-0.15.0-htk62k7efo2z22kh6kmhaselru7bfkuc/bin/patchelf.bak: Permission denied
    Command: '/dev/shm/bc/host/cache/bootstrap/store/linux-centos7-x86_64/gcc-10.2.1/patchelf-0.15.0-htk62k7efo2z22kh6kmhaselru7bfkuc/bin/patchelf.bak' '--force-rpath' '--set-rpath' '/dev/shm/bc/host/cache/bootstrap/store/linux-centos7-x86_64/gcc-10.2.1/patchelf-0.15.0-htk62k7efo2z22kh6kmhaselru7bfkuc/lib:/dev/shm/bc/host/cache/bootstrap/store/linux-centos7-x86_64/gcc-10.2.1/patchelf-0.15.0-htk62k7efo2z22kh6kmhaselru7bfkuc/lib64:/opt/rh/devtoolset-10/root/usr/lib/gcc/x86_64-redhat-linux/10' '/dev/shm/bc/host/cache/bootstrap/store/linux-centos7-x86_64/gcc-10.2.1/patchelf-0.15.0-htk62k7efo2z22kh6kmhaselru7bfkuc/bin/patchelf'
==> Warning: Failed to initialize repository: '/user-environment/repo'.
  No repo.yaml found in '/user-environment/repo'
  To remove the bad repository, run this command:
      spack repo rm /user-environment/repo
simonpintarelli commented 1 year ago

If I understand correctly, it fails because /user-environment exists already on our system and is bind mounted in https://github.com/hpc/stackinator/blob/unshare/stackinator/templates/unshare-newroot.sh#L74

j-ogas commented 1 year ago

If I understand correctly, it fails because /user-environment exists already on our system and is bind mounted in https://github.com/hpc/stackinator/blob/unshare/stackinator/templates/unshare-newroot.sh#L74

Ah, yeah. {{ store_path }} would need to be written by the user in the newroot. @bcumming I think we can get around this with something like the following, assuming {{ store_path }} is an absolute path at this point?

$ find / -maxdepth 1 ! -wholename '/' -and ! -wholename '{{ store }}'  | while read f; do mount_ "$f"; done
j-ogas commented 1 year ago

@bcumming, see 0d6c6fd when you have a moment. I believe it may fix the permission issue you encountered by addressing two oversights: 1) the use case where {{ store }} exists as a sub-directory of /, and (2) {{ build_path }}/store is now being bind mounted over the newroot {{ store }} path as intended.

Also, if it helps, opening a shell in the newroot environment may be useful for tinkering and debugging.

$ cd /dev/jogas.build
$ unshare -U -r -m ./newroot.sh /bin/bash

I was able to build a gcc + cray-mpich environment to completion without issue on our Cray EX test bed. Let me know if you bump into anymore problems. Hope all is well.

simonpintarelli commented 1 year ago

Thank you @j-ogas , it works! Looks good to me. For the sake of consistency with the existing approach, one could probably also bind a tmpfs to the home directory.

bcumming commented 1 year ago

Given the following observations:

How about bootstrapping bwrap as part of the stack build on systems on which it isn't installed? In my opinion, a script like the following that bootstraps bwrap is simpler to maintain:

#!/bin/bash
set -e

basepath=$(pwd)
installpath=${basepath}/local
mkdir -p ${installpath}

echo "log file" > log

libcapversion=2.68
libcaptarfile=libcap-${libcapversion}.tar.gz
libcapurl=https://www.kernel.org/pub/linux/libs/security/linux-privs/libcap2/${libcaptarfile}
wget -nv $libcapurl >> log
tar -xzvf $libcaptarfile >> log

cd libcap-${libcapversion}
make RAISE_SETFCAP=no GOLANG=no USE_GPERF=no SHARED=yes lib=lib prefix=$installpath install >> log

cd ${basepath}

bwrapversion=0.8.0
bwraptarfile=bubblewrap-${bwrapversion}.tar.xz
bwrapurl=https://github.com/containers/bubblewrap/releases/download/v${bwrapversion}/${bwraptarfile}
wget -nv ${bwrapurl} >> log
tar -xvf ${bwraptarfile} >> log

cd bubblewrap-${bwrapversion}
CFLAGS="-I${installpath}/include -L${installpath}/lib" ./configure --disable-sudo --disable-man --without-bash-completion-dir --prefix=${installpath} >> log
make install >> log
j-ogas commented 1 year ago
* Stackinator doesn't require `bwrap` be installed as setuid - it is not installed like this on the CSCS clusters or our workstations.

Ah, I wasn't aware. The bubblewrap README instructions specify sudo in their source build instructions, it didn't occur to me it could be built otherwise.

I was able to use your bootstrap script above to build bwrap. There was some error chatter regarding ldconfig but it wasn't fatal. I was able to build my stack with the standard workflow using the bootstrapped bwrap. Thanks!

Closing this PR.