Open TamarLevy opened 2 years ago
I installed latest nvidia-container toolkit. my nvidia driver version is: 515.76 my cuda version on my machine is: 11.7 my docker version is: 20.10.18
I created a Dockerfile with this content: FROM nvidia/cuda:11.4-runtime-ubuntu20.04 CMD nvcc --version
I build docker file using: docker build . -t nvidia-test
and the output is: Sending build context to Docker daemon 798.3MB Step 1/2 : FROM nvidia/cuda:11.4-runtime-ubuntu20.04 Get "https://registry-1.docker.io/v2/": x509: certificate signed by unknown authority
and run it using: docker run --gpus all nvidia-test
and the output is:
NVIDIA Release 21.10 (build 28019337) PyTorch Version 1.10.0a0+0aef44c
Container image Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2021 Facebook Inc. Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert) Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu) Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu) Copyright (c) 2011-2013 NYU (Clement Farabet) Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston) Copyright (c) 2006 Idiap Research Institute (Samy Bengio) Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz) Copyright (c) 2015 Google Inc. Copyright (c) 2015 Yangqing Jia Copyright (c) 2013-2016 The Caffe contributors All rights reserved.
NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
NOTE: MOFED driver for multi-node communication was not detected. Multi-node communication performance may be reduced.
NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for PyTorch. NVIDIA recommends the use of the following flags: docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...
Sat Sep 24 13:13:00 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 0% 33C P0 54W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
I dont see that nvcc --version was executed, it more look like an output of nvidia-smi also I dont see anywhere that cuda version 11.4 was used. what am I doing wrong?
I think that you need to change
FROM nvidia/cuda:11.4-runtime-ubuntu20.04
to
FROM nvidia/cuda:11.4-devel-ubuntu20.04
in your Dockerfile, which is the developer image for nvidia cuda. That will give you access to nvcc and other developer's capabilities.
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Also, before reporting a new issue, please make sure that:
1. Issue or feature description
2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
nvidia-container-cli -k -d /dev/tty info
uname -a
dmesg
nvidia-smi -a
docker version
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
nvidia-container-cli -V