floydhub / dockerfiles

Deep Learning Dockerfiles
https://docs.floydhub.com/guides/environments/
Apache License 2.0
156 stars 57 forks source link

floydhub/tensorflow seems to be missing stubs from LD_LIBRARY_PATH #57

Closed damonmaria closed 6 years ago

damonmaria commented 6 years ago
$ sudo docker run -it floydhub/tensorflow:1.9.0-gpu.cuda9cudnn7-py3_aws.32 python -c "import tensorflow; print(tensorflow.__version__)"
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/local/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/local/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

But the following works:

$ sudo docker run --env "LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH"  -it floydhub/tensorflow:1.9.0-gpu.cuda9cudnn7-py3_aws.32 python -c "import tensorflow; print(tensorflow.__version__)"
1.9.0
houqp commented 6 years ago

Hi, right now, thanks for the report. For GPU images, you need to run it with nvidia-docker, otherwise, you will get this error.

houqp commented 6 years ago

We use the exact same image in production at FloydHub. If you are not able to get it working with nvidia-docker, please feel free to reopen the issue, I am happy to help dig into it.

damonmaria commented 6 years ago

I am using it inside AWS Batch with an AMI setup as per AWS's instructions for running NVIDIA containers. Their instructions specify testing the AMI with docker but I'll have a go and see when they run the actual batch they use nvidia-docker instead.

Thanks.