Closed rcaspart closed 2 years ago
Hi @rcaspart - this is a bit of a uncommon scenario. It'd be good to know what situation you have where you need to run singularity nested inside itself?
We generally don't want to encourage it if not necessary, as you lose the advantages of the SIF format (the single file has to be extracted to a sandbox) among other things.
We might consider a PR for this, but it's unlikely to be a high priority. As a workaround you can probably use -B/--bind
to bind the libs into their standard locations in the outer container, for the inner Singularity version to pick them up... but this isn't something I've tried.
Hi @dctrud,
I think this actually not a that uncommon scenario. If you like to integrate HPC resources into the WLCG computing for multiple experiments, you need the first layer to provide the traditional WLCG like software environment (Grid middleware) and the experiments themself partly start their own environment in a second singularity layer.
Hi @dctrud, thanks a lot for your reply and thanks to Manuel for outlining the situation. I second Manuel's opinion, that while it most certainly is not the most common scenario, I think it is not the most uncommon one either. While I agree that it is not an ideal solution and brings with it some disadvantages, it unfortunately is the only way we could envisage for these kind of situations.
Thanks for your suggested workaround, I have given this a try and it works. However I suspect, this is only the case due to the feature introduced in #5670, which in turn breaks some use cases for us (see #5766).
Regarding a PR, if you (or others) agree that also checking /.singularity.d/libs
for the libraries is a reasonable and viable way to proceed I am happy to have a look into this and give it a try and eventually contribute this as PR to singularity.
Hi @rcaspart @giffels
I think this actually not a that uncommon scenario. If you like to integrate HPC resources into the WLCG computing for multiple experiments, you need the first layer to provide the traditional WLCG like software environment (Grid middleware) and the experiments themself partly start their own environment in a second singularity layer.
I'm afraid that this is almost certainly an uncommon scenario if we consider our entire user base. I'm not aware of the full details of the WCLG computing configuration, but the vast majority of users are running Singularity directly on an HPC host. In the case where a containerized middleware and nesting is used, I'm afraid it may be required for that middle layer (in the container) to implement some workarounds on occasion. The complex nested setups we know that some sites like WLCG use are varied, and we just don't have detail of how they are implemented to know how changes may affect them.
As an example - this is the first time I've come across anyone using NVIDIA within a nested container setup, and I don't think we've ever considered even testing that. With limited resources I'm afraid that we can't anticipate or test every possible scenario, and we do need to change behavior to move forward.
Regarding a PR, if you (or others) agree that also checking /.singularity.d/libs for the libraries is a reasonable and viable way to proceed I am happy to have a look into this and give it a try and eventually contribute this as PR to singularity.
I'd definitely be happy to consider a PR like this. In general, I'd encourage sites who have nested / complex setups to look at ways in which they can contribute test cases to the code, so that the behavior you need is something that we consider automatically. We'd be very glad to accept these unless they are in conflict with the needs of the broader user base.
Thanks.
Hi @dtrudg Thanks for the suggestion at the slack channel! I am providing another use case here, we are trying to build a containerised desktop environment for our HPC users and would like to provide access to our already containerised singularity applications, for example, Relion or ChimeraX. Starting the desktop container works, and starting containerised apps without the Nvidia library works, but without access to the GPU, most of the applications are not going to operate properly.
Hello,
This is a templated response that is being sent out to all open issues. We are working hard on 'rebuilding' the Singularity community, and a major task on the agenda is finding out what issues are still outstanding.
Please consider the following:
Thanks, Carter
@carterpeel As outlined in detail in the earlier comments, this issue affects various different use cases. It's not solved yet. So this templated comment is not really helpful, it does not even specify how to respond, nor does it ease reading through the issue — it just interrupts the existing discussion.
This issue has been automatically marked as stale because it has not had activity in over 60 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
Dear stale-bot, this regression is still relevant to multiple users.
We had another use case for this today, and I found that doing these commands inside the first singularity container before invoking the nested singularity exec --nv ...
works around the issue:
TMPD=`mktemp -d`
(echo '#!/bin/bash'; echo 'exec /usr/sbin/ldconfig -C '"$TMPD"'/ld.so.cache "$@"') >$TMPD/ldconfig
chmod +x $TMPD/ldconfig
PATH=$TMPD:$PATH
ldconfig $LD_LIBRARY_PATH
This works because the second singularity invokes ldconfig -p
to locate the nvidia libraries. I'm not sure what would be a good solution for changing singularity to make this work automatically.
Any update on this? I'm currently trying to get a nested setup work with Apptainer (1.0.2) and it seems that the workaround of @DrDaveD does not work there anymore.
(I am also a bit unsure if it makes sense to continue the discussion here or if a new issue should be opened in the Apptainer repo.)
The workaround still works for me when I use apptainer both for shell --nv docker://matterminers/wlcg-wn\:latest
on the outside and for another --nv on the inside. Since the container doesn't have apptainer in it, I bind-mounted /cvmfs and ran it from /cvmfs/oasis.opensciencegrid.org/mis/apptainer/bin/apptainer
.
I created apptainer/apptainer#464 for followup, please continue the discussion there.
We are trying to run jobs requiring access to Nvidia GPUs in singularity containers. For organisational and technical reasons we need to be able to run these jobs within a cascade of containers, i.e. running an additional singularity container within the first singularity container.
For a single singularity container using the
--nv
flag works fine and the required libraries and binaries are bind-mounted into the container. However, when starting an additional container inside the first one, none of the Nvidia libraries gets bind-mounted into the second container (the binaries are available).On a first look my assumption is, this is caused by the
nvidia-container-cli
not being available within the first container and singularity as a result falling back to the nvliblist.conf, where the names of the libraries (and binaries) are specified. Singularity then relies on the information from the ld cache to find the respective paths of these libraries. However, the ld cache does not include information about the libraries bind-mount by singularity to/.singularity.d/libs
(and given the read-only nature of our containers can to the best of my knowledge not include them). As a result singularity fails to find the required libraries and does not bind-mount them into the second container.My naive suggestion would be to have singularity in addition to relying on the ld cache also as a fallback check for libraries bind-mounted by singularity in the
/.singularity.d/libs
directory. Or is there any point I am missing here?Version of Singularity:
3.7.0-1.el7
Expected behavior
Nvidia libraries and binaries are bind-mount into and available in both containers.
Actual behavior
Only the Nvidia binaries are bind-mount into and available in both containers. The libraries are only available in the first container.
Steps to reproduce this behavior
Start singularity container with the
--nv
flag. Start second singularity container inside the first one with the--nv
flag.What OS/distro are you running
Scientific Linux 7.9 (Nitrogen)
How did you install Singularity
From EPEL repository