Closed aslisabanci closed 3 years ago
Trying to validate the environment with this command:
./tools/environment_validator.py -b nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 -g python3 -s python38 -d tensorflow-gpu-2.4 -t dependency -n tensorflow-gpu-2.4 --nvidia-support 1
fails with the following error on deep purple (where we have CUDA 10.2 installed)
docker.errors.APIError: 500 Server Error for http+docker://localhost/v1.40/containers/8914501b9ed4ebd13c13dd2d79c053ae88a6abcf2f17975382d3ee720cae1fea/start: Internal Server Error ("OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.2, please update your driver to a newer version, or use an earlier cuda container\\\\n\\\"\"": unknown")
Not progressing with publishing this on test
until we address this.
With the updated CUDA drivers, we can now validate this package as:
./tools/environment_validator.py -b nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04 -g python3 -s python38 -d tensorflow-gpu-2.4 -t dependency -n tensorflow-gpu-2.4 --nvidia-support 1
Things to note above:
runtime
nvidia img as the base, not the devel
for efficiency (see: https://stackoverflow.com/questions/56405159/what-is-the-difference-between-devel-and-runtime-tag-for-a-docker-container)Deleting the branch and the PR as they're already merged into develop from another branch by Daniel.
Couldn't test this on deep purple yet, a bit frustrated with the errors I've been getting so far. I'm opening this PR for your reviews, in case you notice something missing. Any help to test these is also appreciated to make things faster.