Closed glebkuznetsov closed 6 years ago
We saw the following in /var/log/apt/history.log
:
Commandline: /usr/bin/unattended-upgrade
Install:
...
nvidia-384:amd64 (384.66-0ubuntu1, 384.90-0ubuntu0.16.04.1),
This is fixed with recent commits. We can run on p3s now
TL;DR:
Getting this error after a while of using a GPU machine built from the wyss-mlpe AMI:
nvidia-docker | 2017/10/06 22:04:12 Error: unsupported CUDA version: driver 0.0 < image 8.0.61
Once you get this error, it’s no longer possible to run nvidia-docker and thus have to go make a new machine. (Fortunately with our workflow being in docker now this is not that painful)
More details:
We can’t tell exactly what triggers this issue, but it’s something to do interrupting a docker container in any other way than ctrl+c. It seems the machine gets confused about where the Nvidia driver is.
I suspect we’re doing something funky in the nvidia install but can’t really tell what.
@cancan101 pointed out that they encountered a possibly-related issue at Butterfly where Ubuntu was auto-updating nvidia drivers, which was breaking things. They are now using 384.x and TensorFlow 1.3 docker image from google, which works.
Possibly helpful
https://medium.com/@flavienguillocheau/documenting-docker-with-gpu-deep-learning-for-noobs-2edd350ab2f7