NVIDIA / nvidia-container-runtime

NVIDIA container runtime
Apache License 2.0
1.11k stars 159 forks source link

OCI runtime create failed: container - Segmentation Fault #150

Closed Code-Gratefully closed 3 years ago

Code-Gratefully commented 3 years ago

Hi I'm following the instructions here: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian

However, I get the following error message.

docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: signal: segmentation fault (core dumped), stdout: , stderr:: unknown.
ERRO[0000] error waiting for container: context canceled 

Here are the versions of the package that apt resolved:

Setting up nvidia-container-runtime (3.5.0-1) ...
Setting up nvidia-docker2 (2.6.0-1) ...

Cuda version is 10.1

I tried google and the closest thing I got is this: https://github.com/OE4T/meta-tegra/issues/760

However, I'm not using that package so I have no access to that source code: https://github.com/OE4T/meta-tegra/pull/763/files

All I was doing was:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

Thank you so much in advanced.

elezar commented 3 years ago

Hi @LucidusC. As a matter of interest, what are the versions of nvidia-continer-toolkit and libnvidia-container-tools htat are installed?

Code-Gratefully commented 3 years ago

Thank you @elezar for looking into this.

My nvidia-continer-toolkit is of 1.6.0\~rc.1-1, and my libnvidia-container-tools is of 1.5.0\~rc.1-1

elezar commented 3 years ago

@LucidusC as mentioned in https://github.com/NVIDIA/nvidia-docker/issues/1535#issuecomment-899363142 the rc.1 package of libnvidia-container 1.5.0 that was released to the experimental repos had a bug. I have subsequently removed the packages from the repo. Downgrading to the latest stable versions nvidia-container-toolkit 1.5.1 and libnvidia-container-toolks 1.4.0 should address your issue.

Code-Gratefully commented 3 years ago

It solves the problem, thanks!!