dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
1.88k stars 416 forks source link

opencv cannot use cuda #533

Closed JTShuai closed 1 month ago

JTShuai commented 1 month ago

The build report from opencv shows it's built with cuda, but the cv2.cuda.getCudaEnabledDeviceCount() command gets an error.

Environment

Problem reproduce

dusty-nv commented 1 month ago

@JTShuai can you confirm you can use CUDA in another independent container like l4t-jetpack or l4t-pytorch, and without all the extra docker run flags you added like --privileged / ect

JTShuai commented 1 month ago

@JTShuai can you confirm you can use CUDA in another independent container like l4t-jetpack or l4t-pytorch, and without all the extra docker run flags you added like --privileged / ect

Hi, I tried docker run --runtime nvidia -it --rm --network=host dustynv/l4t-pytorch:r35.4.1 and got the error:

docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall clone3: permission denied: unknown.
JTShuai commented 1 month ago

@dusty-nv I just noticed the comments:

Container images are compatible with other minor versions of JetPack/L4T:
    • L4T R32.7 containers can run on other versions of L4T R32.7 (JetPack 4.6+)
    • L4T R35.x containers can run on other versions of L4T R35.x (JetPack 5.1+)

So I tired docker run --runtime nvidia -it --rm --network=host dustynv/l4t-pytorch:r32.7.1, but got the same error.

dusty-nv commented 1 month ago

Hi @JTShuai - had you recently done an apt upgrade? With that adding seccomp filter rule for syscall error, it sounds like the same problem to this one:

https://forums.developer.nvidia.com/t/docker-containers-wont-run-after-recent-apt-get-upgrade/194369

JTShuai commented 1 month ago

Hi @JTShuai - had you recently done an apt upgrade? With that adding seccomp filter rule for syscall error, it sounds like the same problem to this one:

https://forums.developer.nvidia.com/t/docker-containers-wont-run-after-recent-apt-get-upgrade/194369

Thanks for your help! I tried the following commands you wrote in #108

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install nvidia-docker2=2.8.0-1

Now, I can enter container with the command docker run --runtime nvidia -it --rm --network=host dustynv/l4t-pytorch:r32.7.1, but I got a new error with pytorch:

root@tx2-4:/# python3
Python 3.6.9 (default, Mar 10 2023, 16:46:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 196, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 149, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
>>> 
dusty-nv commented 1 month ago

@JTShuai on jetpack 4, CUDA/cuDNN/TensorRT are mounted into the containers from the host device when --runtime nvidia is used. You should have that libcurand.so.10 under /usr/local/cuda/lib64. If you keep having problems with this, I might recommend reflashing your SD card given all the issues you have with docker. Then try again after a fresh re-install without doing the apt upgrade.

JTShuai commented 1 month ago

@JTShuai on jetpack 4, CUDA/cuDNN/TensorRT are mounted into the containers from the host device when --runtime nvidia is used. You should have that libcurand.so.10 under /usr/local/cuda/lib64. If you keep having problems with this, I might recommend reflashing your SD card given all the issues you have with docker. Then try again after a fresh re-install without doing the apt upgrade.

Hi, I manually downgraded the docker to docker.io=20.10.7-0ubuntu1~18.04.2 and containerd=1.5.2-0ubuntu1~18.04.3. And I checked that the libcurand.so.10 is under /usr/local/cuda/lib64.

Still having the same error, so I will try reflashing the SD card.

JTShuai commented 1 month ago

Problem solved after reflashing the TX2.