NVIDIA / cuQuantum

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples
https://docs.nvidia.com/cuda/cuquantum/
BSD 3-Clause "New" or "Revised" License
320 stars 63 forks source link

Sudo permission issue for cuquantum-appliance:23.10 container #125

Open namehta4 opened 3 months ago

namehta4 commented 3 months ago

Hi All,

I am trying to use cuquantum-appliance:23.10 with shifter on NERSC Perlmutter system. I am facing the following sudo permission issue with this container:

namehta4@perlmutter:login36:~> salloc -N 1 -G 4 -C gpu -t 120 -c 64 -A nstaff -q interactive --image=nvcr.io/nvidia/cuquantum-appliance:23.10
salloc: Granted job allocation 22896843
salloc: Waiting for resource configuration
salloc: Nodes nid200432 are ready for job
namehta4@nid200432:~> shifter /bin/bash
(base) namehta4@nid200432:~$ cd /home/cuquantum/
bash: cd: /home/cuquantum/: Permission denied
(base) namehta4@nid200432:~$ sudo cd /home/cuquantum
sudo: The "no new privileges" flag is set, which prevents sudo from running as root.
sudo: If sudo is running in a container, you may need to adjust the container configuration to disable the flag.

As far as I know, this is a new issue as the behavior is different compared to the previous imae (23.03)

namehta4@perlmutter:login36:~> salloc -N 1 -G 4 -C gpu -t 120 -c 64 -A nstaff -q interactive --image=nvcr.io/nvidia/cuquantum-appliance:23.03
salloc: Pending job allocation 22896859
salloc: job 22896859 queued and waiting for resources
salloc: job 22896859 has been allocated resources
salloc: Granted job allocation 22896859
salloc: Waiting for resource configuration
salloc: Nodes nid200436 are ready for job
namehta4@nid200436:~> shifter /bin/bash
(base) namehta4@nid200436:~$ cd /home/cuquantum/
(base) namehta4@nid200436:/home/cuquantum$ ls
LICENSE  conda  examples

May I please use your help in resolving this issue?

Thank you! Neil Mehta

erinaldiq commented 3 months ago

@namehta4 The image 23.06 does not have the sudo permission issue. It seems that the commands used for image 23.03 also work on 23.06

haidarazzam commented 3 months ago

Dear @namehta4 was this issue resolved for you? Thanks

namehta4 commented 3 months ago

Hi @haidarazzam , no the issue still persists:

namehta4@perlmutter:login25:~> salloc -N 1 -G 4 -C gpu -t 120 -c 64 -A nstaff -q interactive --image=nvcr.io/nvidia/cuquantum-appliance:23.10-devel-ubuntu22.04
salloc: Pending job allocation 23801046
salloc: job 23801046 queued and waiting for resources
salloc: job 23801046 has been allocated resources
salloc: Granted job allocation 23801046
salloc: Waiting for resource configuration
salloc: Nodes nid001180 are ready for job
namehta4@nid001180:~> shifter /bin/bash
(base) namehta4@nid001180:~$ cd /home/cuquantum/
bash: cd: /home/cuquantum/: Permission denied
mtjrider commented 3 months ago

--image=nvcr.io/nvidia/cuquantum-appliance:23.10-devel-ubuntu22.04

Can you confirm if the issue exists with --image=nvcr.io/nvidia/cuquantum-appliance:23.10-devel-ubuntu20.04?

namehta4 commented 3 months ago

Hi @mtjrider, Ha! The issue seems resolved. Thank you!

namehta4@perlmutter:login25:~> salloc -N 1 -G 4 -C gpu -t 120 -c 64 -A nstaff -q interactive --image=nvcr.io/nvidia/cuquantum-appliance:23.10-devel-ubuntu20.04
salloc: Pending job allocation 23801343
salloc: job 23801343 queued and waiting for resources
salloc: job 23801343 has been allocated resources
salloc: Granted job allocation 23801343
salloc: Waiting for resource configuration
salloc: Nodes nid001249 are ready for job
namehta4@nid001249:~> shifter /bin/bash
(base) namehta4@nid001249:~$ cd /home/cuquantum/
(base) namehta4@nid001249:/home/cuquantum$ cd conda/envs/cuquantum-23.10/bin/
(base) namehta4@nid001249:/home/cuquantum/conda/envs/cuquantum-23.10/bin$ ./python
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cupy
>>> import cusvaer
>>> exit()

@erinaldiq, I will add ipykernel etc to this base image and upload it asap. Please test it out in roughly an hour or so.

Thank you again @mtjrider and @haidarazzam

mtjrider commented 3 months ago

Hi @mtjrider, Ha! The issue seems resolved. Thank you!

namehta4@perlmutter:login25:~> salloc -N 1 -G 4 -C gpu -t 120 -c 64 -A nstaff -q interactive --image=nvcr.io/nvidia/cuquantum-appliance:23.10-devel-ubuntu20.04
salloc: Pending job allocation 23801343
salloc: job 23801343 queued and waiting for resources
salloc: job 23801343 has been allocated resources
salloc: Granted job allocation 23801343
salloc: Waiting for resource configuration
salloc: Nodes nid001249 are ready for job
namehta4@nid001249:~> shifter /bin/bash
(base) namehta4@nid001249:~$ cd /home/cuquantum/
(base) namehta4@nid001249:/home/cuquantum$ cd conda/envs/cuquantum-23.10/bin/
(base) namehta4@nid001249:/home/cuquantum/conda/envs/cuquantum-23.10/bin$ ./python
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cupy
>>> import cusvaer
>>> exit()

@erinaldiq, I will add ipykernel etc to this base image and upload it asap. Please test it out in roughly an hour or so.

Thank you again @mtjrider and @haidarazzam

Great. This means the root cause is a change in default file-permissions for the home directory under Ubuntu 22.04. Thanks for reporting this.

namehta4 commented 2 months ago

Adding for posterity, this issue is also observed in cuda_quantum:0.6 image. Would this have to filled separately?