Open Chroxvi opened 10 months ago
If this happens on a login-node on the LUMI supercomputer, as a workaround, you can try to build your container on another node - either another login node or a compute node. To use a LUMI-C compute node, you can submit a job using srun
, e.g. something along the lines of: srun --account=project_<your_project_id> --time=00:15:00 --mem=64G --cpus-per-task=32 --partition=small cotainr build lumi_pytorch_rocm_demo.sif --system=lumi-g --conda-env py311_rocm542_pytorch.yml
. Note that you have to request enough memory, since /tmp is mounted as an in-memory filesystem on LUMI compute nodes. Also note that building on a compute node will consume CPUh/GPUh resources from your LUMI project.
If you try to build a container using cotainr, e.g.
cotainr build lumi_pytorch_rocm_demo.sif --system=lumi-g --conda-env py311_rocm542_pytorch.yml
, on a system which does not have sufficient space on /tmp to store the entire container, you will encounter an error like:Cotainr does not provide a CLI option, environment variable, or similar for changing the location of the temporary sandbox directory, created during the build phase, to another location than /tmp. This might be a problem if: