--nv flag does not work in singularity nested inside singularity

rcaspart commented 3 years ago

We are trying to run jobs requiring access to Nvidia GPUs in singularity containers. For organisational and technical reasons we need to be able to run these jobs within a cascade of containers, i.e. running an additional singularity container within the first singularity container.

For a single singularity container using the --nv flag works fine and the required libraries and binaries are bind-mounted into the container. However, when starting an additional container inside the first one, none of the Nvidia libraries gets bind-mounted into the second container (the binaries are available).

On a first look my assumption is, this is caused by the nvidia-container-cli not being available within the first container and singularity as a result falling back to the nvliblist.conf, where the names of the libraries (and binaries) are specified. Singularity then relies on the information from the ld cache to find the respective paths of these libraries. However, the ld cache does not include information about the libraries bind-mount by singularity to /.singularity.d/libs (and given the read-only nature of our containers can to the best of my knowledge not include them). As a result singularity fails to find the required libraries and does not bind-mount them into the second container.

My naive suggestion would be to have singularity in addition to relying on the ld cache also as a fallback check for libraries bind-mounted by singularity in the /.singularity.d/libs directory. Or is there any point I am missing here?

Version of Singularity:

3.7.0-1.el7

Expected behavior

Nvidia libraries and binaries are bind-mount into and available in both containers.

Actual behavior

Only the Nvidia binaries are bind-mount into and available in both containers. The libraries are only available in the first container.

$ singularity shell --nv docker://matterminers/wlcg-wn\:latest
INFO:    Using cached SIF image
Singularity> singularity shell --nv docker://matterminers/wlcg-wn\:latest
INFO:    Using cached SIF image
INFO:    Convert SIF file to sandbox...
WARNING: underlay of /usr/bin/nvidia-smi required more than 50 (924) bind mounts
Singularity> nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

Steps to reproduce this behavior

Start singularity container with the --nv flag. Start second singularity container inside the first one with the --nv flag.

What OS/distro are you running

Scientific Linux 7.9 (Nitrogen)

How did you install Singularity

From EPEL repository

dtrudg commented 3 years ago

Hi @rcaspart - this is a bit of a uncommon scenario. It'd be good to know what situation you have where you need to run singularity nested inside itself?

We generally don't want to encourage it if not necessary, as you lose the advantages of the SIF format (the single file has to be extracted to a sandbox) among other things.

We might consider a PR for this, but it's unlikely to be a high priority. As a workaround you can probably use -B/--bind to bind the libs into their standard locations in the outer container, for the inner Singularity version to pick them up... but this isn't something I've tried.

giffels commented 3 years ago

Hi @dctrud,

I think this actually not a that uncommon scenario. If you like to integrate HPC resources into the WLCG computing for multiple experiments, you need the first layer to provide the traditional WLCG like software environment (Grid middleware) and the experiments themself partly start their own environment in a second singularity layer.

rcaspart commented 3 years ago

Hi @dctrud, thanks a lot for your reply and thanks to Manuel for outlining the situation. I second Manuel's opinion, that while it most certainly is not the most common scenario, I think it is not the most uncommon one either. While I agree that it is not an ideal solution and brings with it some disadvantages, it unfortunately is the only way we could envisage for these kind of situations.

Thanks for your suggested workaround, I have given this a try and it works. However I suspect, this is only the case due to the feature introduced in #5670, which in turn breaks some use cases for us (see #5766).

Regarding a PR, if you (or others) agree that also checking /.singularity.d/libs for the libraries is a reasonable and viable way to proceed I am happy to have a look into this and give it a try and eventually contribute this as PR to singularity.

dtrudg commented 3 years ago

Hi @rcaspart @giffels

I think this actually not a that uncommon scenario. If you like to integrate HPC resources into the WLCG computing for multiple experiments, you need the first layer to provide the traditional WLCG like software environment (Grid middleware) and the experiments themself partly start their own environment in a second singularity layer.

I'm afraid that this is almost certainly an uncommon scenario if we consider our entire user base. I'm not aware of the full details of the WCLG computing configuration, but the vast majority of users are running Singularity directly on an HPC host. In the case where a containerized middleware and nesting is used, I'm afraid it may be required for that middle layer (in the container) to implement some workarounds on occasion. The complex nested setups we know that some sites like WLCG use are varied, and we just don't have detail of how they are implemented to know how changes may affect them.

As an example - this is the first time I've come across anyone using NVIDIA within a nested container setup, and I don't think we've ever considered even testing that. With limited resources I'm afraid that we can't anticipate or test every possible scenario, and we do need to change behavior to move forward.

Regarding a PR, if you (or others) agree that also checking /.singularity.d/libs for the libraries is a reasonable and viable way to proceed I am happy to have a look into this and give it a try and eventually contribute this as PR to singularity.

I'd definitely be happy to consider a PR like this. In general, I'd encourage sites who have nested / complex setups to look at ways in which they can contribute test cases to the code, so that the behavior you need is something that we consider automatically. We'd be very glad to accept these unless they are in conflict with the needs of the broader user base.

Thanks.

jafaruddinlie commented 3 years ago

Hi @dtrudg Thanks for the suggestion at the slack channel! I am providing another use case here, we are trying to build a containerised desktop environment for our HPC users and would like to provide access to our already containerised singularity applications, for example, Relion or ChimeraX. Starting the desktop container works, and starting containerised apps without the Nvidia library works, but without access to the GPU, most of the applications are not going to operate properly.

carterpeel commented 3 years ago

Hello,

This is a templated response that is being sent out to all open issues. We are working hard on 'rebuilding' the Singularity community, and a major task on the agenda is finding out what issues are still outstanding.

Please consider the following:

Is this issue a duplicate, or has it been fixed/implemented since being added?
Is the issue still relevant to the current state of Singularity's functionality?
Would you like to continue discussing this issue or feature request?

Thanks, Carter

olifre commented 3 years ago

@carterpeel As outlined in detail in the earlier comments, this issue affects various different use cases. It's not solved yet. So this templated comment is not really helpful, it does not even specify how to respond, nor does it ease reading through the issue — it just interrupts the existing discussion.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had activity in over 60 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

olifre commented 3 years ago

Dear stale-bot, this regression is still relevant to multiple users.

DrDaveD commented 3 years ago

We had another use case for this today, and I found that doing these commands inside the first singularity container before invoking the nested singularity exec --nv ... works around the issue:

TMPD=`mktemp -d`
(echo '#!/bin/bash'; echo 'exec /usr/sbin/ldconfig -C '"$TMPD"'/ld.so.cache "$@"') >$TMPD/ldconfig
chmod +x $TMPD/ldconfig
PATH=$TMPD:$PATH
ldconfig $LD_LIBRARY_PATH

This works because the second singularity invokes ldconfig -p to locate the nvidia libraries. I'm not sure what would be a good solution for changing singularity to make this work automatically.

luator commented 2 years ago

Any update on this? I'm currently trying to get a nested setup work with Apptainer (1.0.2) and it seems that the workaround of @DrDaveD does not work there anymore.

(I am also a bit unsure if it makes sense to continue the discussion here or if a new issue should be opened in the Apptainer repo.)

DrDaveD commented 2 years ago

The workaround still works for me when I use apptainer both for shell --nv docker://matterminers/wlcg-wn\:latest on the outside and for another --nv on the inside. Since the container doesn't have apptainer in it, I bind-mounted /cvmfs and ran it from /cvmfs/oasis.opensciencegrid.org/mis/apptainer/bin/apptainer.

I created apptainer/apptainer#464 for followup, please continue the discussion there.

apptainer / singularity