Build NVIDIA-Docker image with proper PyTorch version to run DARTS code #10

JiaweiZhuang commented 4 years ago

The best way to run the DARTS scripts in production is probably via nvidia-docker. It is important to freeze the environment as the DARTS code requires PyTorch == 0.3.1, torchvision == 0.2.0. Newer pytorch version crashes for various reasons.

The same container image can run on

Here're the complete steps to install NVIDIA-Docker on Ubuntu-18.04, AWS p2.xlarge instance. The commands are a bit dense, but they can be wrapped into a single shell script.

1. Install CUDA driver

Get the relatively new nvidia-430 version from

sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt-get update
sudo apt-get install -y nvidia-driver-430 nvidia-modprobe

Test installation:

$ nvidia-smi
Mon Oct  7 18:43:14 2019
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   47C    P0    59W / 149W |      0MiB / 11441MiB |     99%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No running processes found                                                 |

2. Install the standard Docker


sudo apt-get install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \

curl -fsSL | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
"deb [arch=amd64] \
$(lsb_release -cs) \

sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli

sudo groupadd docker
sudo usermod -aG docker $USER  # allow running docker without sudo, need to re-login

Test installation:

$ docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

3. Install NVIDIA-Docker


distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L | sudo apt-key add -
curl -s -L$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Test installation (

$ docker run --gpus all nvidia/cuda:9.0-base nvidia-smi
Mon Oct  7 18:46:13 2019
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   52C    P0    69W / 149W |      0MiB / 11441MiB |     97%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No running processes found                                                 |
The easiest way to get PyTorch-GPU image is from NVIDIA NGC. The image contains a lot of stuff including JupyterLab and TensorBoard (see release notes)

docker pull
docker run --gpus all -it --rm 

Inside the container, try:

$ python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
1.2.0a0+afb7a16 True

It is the latest 1.2.0 version. Need to roll back to 0.3.1.

Done via and

Follow the README in docker/install-nvidia-docker and docker/darts-pytorch-image. Everyone should be able to run the default script and get the expected result:

10/07 08:02:25 PM test 000 1.233736e-01 96.875000 100.000000
10/07 08:02:48 PM test 050 1.105459e-01 97.120095 99.959150
10/07 08:03:11 PM test 100 1.074739e-01 97.359733 99.948432
10/07 08:03:12 PM test_acc 97.369997
@dylanrandle Here's how to run DARTS on graphene data within the container:

# get data and source code
mkdir data
wget -P ./data/
git clone

# run training
docker run --rm -it --gpus all -v $(pwd):/workdir/host_files darts-pytorch
cd host_files
python3 darts/cnn/ --data ./data/ --dataset graphene
This is absolutely awesome. Brilliant!

I changed pytorch==0.3.1 (built with CUDA80) to (built with CUDA 90), otherwise the DARTS script will crash on new GPU types such as p3.2xlarge (V100):

/usr/local/lib/python3.6/dist-packages/torch/cuda/ UserWarning:
    Found GPU0 Tesla V100-SXM2-16GB which requires CUDA_VERSION >= 9000 for
     optimal performance and fast startup time, but your PyTorch was compiled
     with CUDA_VERSION 8000. Please install the correct PyTorch binary
     using instructions from

  warnings.warn(incorrect_binary_warn % (d, name, 9000, CUDA_VERSION))
