Open SomeoneSerge opened 2 years ago
I would probably have more spare time after June 21.
To speed up the merging of #158486, I would prefer limiting the scope of this PR to:
singularity
, singularity.nix
and singularity-tools
.Further improvements can be done in successive PRs.
I agree with limiting the scope of the PR, I'll have time to help in a couple of weeks.
[ ] Annoyance: we can compute
diskSize
from the builtcontents
instead of choosing an arbitrary constant
Is there a way to compute diskSize
from contents
at eval time with no IFD?
Regarding the singularity-tools, a significant problem is the closure size being doubled unnecessarily by mkLayer
.
singularity-tools.mkLayer
generates all a new derivation by copying all the files and directories of each package into "$out", and then we use writeReferencesToFile
to get the list of derivations in the dependency tree of the generated layer package.
Why don't we get a list of references of all the packages directly?
Here's my implementation which merges the writeReferencesToFiles
result of all the packages in the list while removing the duplication. There should be a better implementation than the O(n^2) duplication removal, but it's much faster than the O(n) content copying of mkLayer anyway.
{
writeMultipleReferencesToFile = paths: runCommand "runtime-deps-multiple" {
referencesFiles = map writeReferencesToFile paths;
} ''
touch "$out"
declare -a paths=();
for refFile in $referencesFiles; do
while read path; do
isPathIncluded=0
for pathIncluded in "''${paths[@]}"; do
if [[ "$path" == "$pathIncluded" ]]; then
isPathIncluded=1
break
fi
done
if (( ! isPathIncluded )); then
echo "$path" >> "$out"
paths+=( "$path" )
fi
done < "$refFile"
done
'';
}
Is there a way to compute diskSize from contents at eval time with no IFD?
I cannot say what's a good way to compute it, but the trivial baseline is a derivation that takes, in buildInputs
, a buildEnv
with to-be image's contents
, and du
s it. The output times some constant is an upperbound on the diskSize
EDIT: i.e. we wouldn't know diskSize
at nix eval time, but we'd know it at build time, which appears to be sufficient
EDIT: i.e. we wouldn't know
diskSize
at nix eval time, but we'd know it at build time, which appears to be sufficient
Then we would no longer be able to use
vmTools.runInLinuxVM (
runCommand {
preVM = vmTools.createEmptyImage {
size = diskSize;
fullName = "${projectName}-run-disk";
};
} ''
mkfs -t ext3 -b 4096 /dev/${vmTools.hd}
mount "/dev/${vm.hd}" disk
''
)
I see now. It appears that createEmptyImage
never uses size
at eval time, so we could rewrite it to relax the constraint
https://github.com/NixOS/nixpkgs/blob/4b31cc7551cbc795e30670d09845acdeb0f41651/pkgs/build-support/vm/default.nix#L280
I see now. It appears that
createEmptyImage
never usessize
at eval time, so we could rewrite it to relax the constraint https://github.com/NixOS/nixpkgs/blob/4b31cc7551cbc795e30670d09845acdeb0f41651/pkgs/build-support/vm/default.nix#L280
Great! dockerTools
would also benefits from that.
For consistency with dockerTools.buildImage
it would also be nice to change contents
-> copyToRoot
.
@SomeoneSerge any other HPC pain points?
~I don't mind changing the interface of singularity-tools. (That would be a breaking change.)~ Sorry for not noticing the change of dockerTool.buildImage
.
There's another change lineing up that builds the image through a Singularity definition (Apptainer recipe) file to make the image more declarative and the build process explainable. It could be a drop-in replacement of the current Singularity-sandbox-based implementation.
I also went on and made a generator
function that turns a settings
-like Nix attrset into a definition string. The parser, which does the reverse) is still work in progress.
I'm sorry for the long absence, my priorities had shifted somewhat
@dmadisetti On the high-level I've exactly one pain-point, and that is an unsolved (underinvested) use-case:
/nix/store
on a cluster that doesn't support user namespaces nor overlayfs, but has a setuid singularity binarysingularity-tools.buildImage
I think I might give this a shot again. The issues I had were:
--overlay
enabled, so I have to use --bind
--bind /tmp/blah:/nix/store
hides the container's /nix/store
-> singularity run
fails unable to locate the sym-linked sh
and such.singularity-tools.buildImage
doesn't give user full control over contents
, I cannot easily replace the whole thing with static coreutils and a static NixShouldn't be hard to alleviate
This now suggests another point, that we maybe want a buildImage
that is extendable and overridable, including the possibility to override the default contents
. One direction could be makeOverride
, and I think there's a similar effort being undertaken for dockerTools
: https://github.com/NixOS/nixpkgs/pull/208944
Another possibility is the module system with support for mkMerge
/mkForce
etc, similar to NixOS. This could also be a viable approach to re-implement the upstream's "definition" files in pure Nix, so as to achieve a declarative interface to buildImage
.
@ShamrockLee, settings
approach sounds great, I think this should feel very native in Nixpkgs. Has your work gone into any PRs yet?
@ShamrockLee,
settings
approach sounds great, I think this should feel very native in Nixpkgs. Has your work gone into any PRs yet?
Not yet, but I already have the implementation integrated the change into my HEP analysis workflow.
It's time to also re-think about the buildImage
interface IMO.
@ShamrockLee are you on any of the nixos matrix channels, btw?
Hopefully not adding to the noise. My current workflow is making a docker tar with nix, unpacking it, and turning it singularity. A bit of a hack, but it works?
packages.docker = pkgs.dockerTools.buildNixShellImage {
name = "pre-sif-container";
tag = "latest";
drv = devShells.default;
};
packages.singularity = pkgs.stdenv.mkDerivation {
name = "container.sif";
src = .;
installPhase = ''
mkdir unpack
tar xzvf ${packages.docker}/image.tgz -C unpack
# Singularity can't handle .gz
tar -C unpack/ -cvf layer.tar .
# TODO: Allow for module of user defined nightly, opposed to using src
singularity build $out Singularity.nightly
'';
};
Singularity.nightly containing
Bootstrap:docker-archive
From:layer.tar
....
Big fan of using the Singularity file to define hooks etc..
@ShamrockLee are you on any of the nixos matrix channels, btw?
I don't have experience using Matrix, yet. How do I join one?
There's a bunch options, including a web client. I think you can just follow either of the links
By the way, I was meaning to ask, why do we have to runInLinuxVM
? I remember seeing @jbedo mention this allowed setting setuid flags, but I'm not sure where do we need them. I presume QEMU takes its performance toll
It's time to also re-think about the buildImage interface IMO @ShamrockLee
Oh, I'll just throw some bait in. Have you noticed https://discourse.nixos.org/t/working-group-member-search-module-system-for-packages/26574/8 and https://github.com/DavHau/drv-parts in particular?
My current workflow is making a docker tar with nix @dmadisetti
I guess your post further proves there's a use-case:)
By the way, I was meaning to ask, why do we have to
runInLinuxVM
?
It was not until last year that the unprivileged image-building workflow started to be implemented in the Apptainer project. The program used to assert UID == 0
when building the image.
We are closed to the unprivileged image generation with Apptainer. The remaining obstacle is its use of /var/apptainer/mnt/session
as the container mount point.
See https://github.com/apptainer/apptainer/issues/215
Sylabs's Singularity fork seems to have caught up some progress in unprivileged image build, but it still expects a bunch of top-level directories /var/singularity/mnt/{container,final,overlay,session,source}
, IIRC.
It was not until last year that the unprivileged image-building workflow started to be implemented in the Apptainer project. The program used to assert UID == 0 when building the image.
I see. So, in principle, we could have run everything except ${projectName} build $out ./img
outside QEMU?
I see. So, in principle, we could have run everything except
${projectName} build $out ./img
outside QEMU?
It's true when it comes to the definition-based build. It won't help much, since it should be trivial in terms of resources to generate the definition file from the definition attrset.
As for the current, Apptainer-sandbox-based buildImage
, I'm not sure if we could run the ushare ...
lines for runAsRoot
outside QEMU. (Update: Currently, runAsRootScript
uses the mount --rbind
-ed /nix/store
, and it simply cannot just run without some kind of emulation.)
It won't help much
I was rather wondering if we could prepare the file-tree outside qemu and somehow pack the whole batch into an ext3/squashfs image without the mount
. But then again, I didn't measure, maybe that too is insignificant
It won't help much
I was rather wondering if we could prepare the file-tree outside qemu and somehow pack the whole batch into an ext3/squashfs image without the
mount
. But then again, I didn't measure, maybe that too is insignificant
I also prefer an approach that doesn't involve creating and running virtual machines. singularity/apptainer can run filesytems in squashfs, and I use this script to create containers:
{ pkgs
, contents
, runscript ? "#!/bin/sh\nexec ${pkgs.hello}/bin/hello"
, startscript ? "#!/bin/sh\nexec ${pkgs.hello}/bin/hello"
}:
pkgs.runCommand "make-container" {} ''
set -o pipefail
closureInfo=${pkgs.closureInfo { rootPaths=contents ++ [pkgs.bashInteractive]; }}
mkdir -p $out/r/{bin,etc,dev,proc,sys,usr,.singularity.d/{actions,env,libs}}
cd $out/r
cp -na --parents $(cat $closureInfo/store-paths) .
touch etc/{passwd,group}
ln -s /bin usr/
ln -s ${pkgs.bashInteractive}/bin/bash bin/sh
for p in ${pkgs.lib.concatStringsSep " " contents}; do
ln -sn $p/bin/* bin/ || true
done
echo "${runscript}" >.singularity.d/runscript
echo "${startscript}" >.singularity.d/startscript
chmod +x .singularity.d/{runscript,startscript}
cd $out
${pkgs.squashfsTools}/bin/mksquashfs r container.sqfs -no-hardlinks -all-root
''
FYI: With https://github.com/apptainer/apptainer/pull/1284, Apptainer images can be built as a derivation without a VM.
The code already works (tested with singularity-tools.buildImageFromDef from #224636 specifying buildImageFlags = [ "--resolv ${pkgs.emptyFile}" "--hosts ${pkgs.emptyFile}" ];
).
The upstream maintainer expects something more general (such as --no-mount
), so the current change is not likely to get accepted. Nevertheless, it proves that fully-unprivileged Apptainer image build is possible.
I'm sorry for the long absence, my priorities had shifted somewhat
@dmadisetti On the high-level I've exactly one pain-point, and that is an unsolved (underinvested) use-case:
- [ ] I want to use singularity to bind-mount
/nix/store
on a cluster that doesn't support user namespaces nor overlayfs, but has a setuid singularity binary- [ ] I want to ship a pre-built Nix in a singularity image
- [ ] I want to be able to build that image using Nix, e.g. via
singularity-tools.buildImage
I think I might give this a shot again. The issues I had were:
- [ ] As I said, cluster's singularity installation doesn't come with
--overlay
enabled, so I have to use--bind
- [ ] Using
--bind /tmp/blah:/nix/store
hides the container's/nix/store
->singularity run
fails unable to locate the sym-linkedsh
and such.- [ ] Because
singularity-tools.buildImage
doesn't give user full control overcontents
, I cannot easily replace the whole thing with static coreutils and a static NixShouldn't be hard to alleviate
It's a bit hacky but I think this achieves your goals:
singularity-tools.buildImage {name = "minimal-nix"; runAsRoot = "${rsync}/bin/rsync -a ${pkgsStatic.nix}/ ./";}
@ShamrockLee are you on any of the nixos matrix channels, btw?
@SomeoneSerge I finally got a Matrix account (@shamrocklee:matrix.org
) and join the Nix HPC room, thanks to the Summer of Nix.
I managed to get a CUDA-capable container built by adjusting memSize
along with diskSize
.
Running it with env vars isn't solved yet.
apptainer has merged a PR that allows to use apptainer to build containers in the Nix sandbox: https://github.com/apptainer/apptainer/pull/2394
With that change, it's possible to build containers with
$ nix-build
default.nix:
{ pkgs ? import <nixpkgs> {} }:
pkgs.callPackage ./make-container.nix {
inherit pkgs;
contents = with pkgs; [
busybox
nginx
];
}
make-apptainer.nix
{ apptainer ? pkgs.apptainer, contents, pkgs }:
pkgs.runCommand "make-container" {} ''
closureInfo=${pkgs.closureInfo { rootPaths = contents ++ [ pkgs.bashInteractive ]; }}
set -x
mkdir -p $out/r/{bin,etc,dev,proc,sys,usr,var/log}
cd $out/r
cp -na --parents $(cat $closureInfo/store-paths) .
touch etc/{passwd,group,resolv.conf}
ln -s /bin usr/
ln -s ${pkgs.bashInteractive}/bin/bash bin/sh
for p in ${pkgs.lib.concatStringsSep " " contents}; do
ln -sn $p/bin/* bin/ || true
done
touch $out/apptainer.conf $out/resolv.conf
export HOME=$out
find . -ls
${apptainer}/bin/apptainer --config $out/apptainer.conf --debug --verbose build -B $out/resolv.conf:/etc/resolv.conf --disable-cache --fakeroot $out/container.sif $out/r
''
This copies the closure of $contents to $out/r, links all bin/* to /bin/, creates dummy apptainer.conf and resolv.conf files, and finally runs apptainer build.
Let's land #268199 by splitting it into smaller PRs. We could then add the unprivileged Apptainer image build flow as one of its reusable components.
Here's the first one: #332168
Let's land #268199 by splitting it into smaller PRs. We could then add the unprivileged Apptainer image build flow as one of its reusable components.
By the way, maybe we should consider dropping support for choosing between apptainer and singularity for building images. For one thing, I suspect we'll have to introduce a separate attribute (like _apptainer-derandomized
or _siftool-derandomized
; https://github.com/NixOS/nixpkgs/issues/279250) for a tool patched to leave out all the UUIDs and the timestamps, and it's probably not worth it to maintain patches for both forks...
If the images by one can be ran by the other and are expected to do so going forward then i don't see a problem with that.
For one thing, I suspect we'll have to introduce a separate attribute (like
_apptainer-derandomized
or_siftool-derandomized
; #279250) for a tool patched to leave out all the UUIDs and the timestamps, and it's probably not worth it to maintain patches for both forks...
How does patching Apptainer and SingularityCE (the apptainer
and singularity
part) make it difficult to choose between Apptainer and SingularityCE for building images (the singularity-tools
part)? We could define apptainer
and singularity
separately if their build flow differs too much while maintaining only one singularity-tools
for the command-line interface they share in common.
How does patching Apptainer and SingularityCE (the apptainer and singularity part) make it difficult to choose between Apptainer and SingularityCE for building images (the singularity-tools part)?
It doesn't, it's just that why would we patch them both separately, if we only really need the patches for singularity-tools
, not for the user-facing singularity?
If the images by one can be ran by the other and are expected to do so going forward then i don't see a problem with that.
We could even package siftool separately, and that could be enough...
The development would be a lot easier if the reproducible image build functionality could be implemented upstream.
We could even package siftool separately, and that could be enough...
I seems to lose track of this. What is siftool?
Issue description
I intend to start using nixpkgs'
singularity-tools
for hpc applications. What follows is a list of hindrances and minor annoyances that I've immediately encountered. The list is mostly for myself: I'm opening the issue to make this visible and maybe motivate people to voice ideas and comments. Cf. this read on singularity with Nix for more inspiration[ ] VM-free image builds: https://github.com/NixOS/nixpkgs/issues/177908#issuecomment-2263349565
[ ] Singularity needs patching to make images reproducible: https://github.com/NixOS/nixpkgs/issues/279250
mkfs
generates randomUUID
s[ ] Give users control over
contents
, in particular allow to removebash
: currently includingbash
manually results insingularity-tools.buildImage
throwing obscure errors[ ] Annoyance: we can compute
diskSize
from the builtcontents
instead of choosing an arbitrary constant[ ] Hindrance: failing to pack any cuda-enabled dependencies. The error says:
... Cannot allocate memory
. My/tmp
is on disk, and I don't seem to be running out of RAM, so this message might be just another version of "not enough space left on (squashfs) device"[ ] Hindrance:
buildImage
interface doesn't exposeapphelp
)[ ] ...
[x] Get this merged: https://github.com/NixOS/nixpkgs/pull/158486
CC (possibly interested) @ShamrockLee @jbedo