ghostplant / tensorflow-wheel-collections

Dockerfile to build Tensorflow-GPU v1.10 with native CUDA driver (e.g. CUDA 8.0/CUDA 9.0/CUDA 9.2/CUDA 10.0)
19 stars 5 forks source link

Why use this line: RUN echo "/usr/local/cuda-8.0/targets/x86_64-linux/lib/stubs" > /etc/ld.so.conf.d/cuda-8.0-stubs.conf && ldconfig #2

Open VisionTheta opened 6 years ago

VisionTheta commented 6 years ago

Hi, thanks for your suggestion of using docker. I am writing Dockerfile for a day about, and build it cost me lots of time, even though I used i7-6850K CPU.

During trying to build tf 1.8 against cuda 8 on centos 7, it encountered many problems. My base setting is as follows:

Centos: 7.5 1804 + CUDA 8.0 + Cudnn 6.0 (FROM nvidia/cuda:8.0-cudnn6-devel-centos7)
python 3.5.2 (installed from source) + bazel 0.10.0 (installed from source) + tf r1.8

And, an error I cannot solve is as follows:

/root/tensorflow/tensorflow/cc/BUILD:422:1: Linking of rule '//tensorflow/cc:ops/lookup_ops_gen_cc' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow && \
  exec env - \
    LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 \
    PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PWD=/proc/self/cwd \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -o bazel-out/host/bin/tensorflow/cc/ops/lookup_ops_gen_cc '-Wl,-rpath,$ORIGIN/../../../_solib_local/_U_S_Stensorflow_Scc_Cops_Slookup_Uops_Ugen_Ucc___Utensorflow' '-Wl,-rpath,$ORIGIN/../../../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib' -Lbazel-out/host/bin/_solib_local/_U_S_Stensorflow_Scc_Cops_Slookup_Uops_Ugen_Ucc___Utensorflow -Lbazel-out/host/bin/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib '-Wl,-rpath,$ORIGIN/,-rpath,$ORIGIN/..,-rpath,$ORIGIN/../..' -Wl,-z,notext -Wl,-z,notext -pthread -Wl,-rpath,../local_config_cuda/cuda/lib64 -Wl,-rpath,../local_config_cuda/cuda/extras/CUPTI/lib64 -Wl,-no-as-needed -B/usr/bin/ -pie -Wl,-z,relro,-z,now -no-canonical-prefixes -pass-exit-codes '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,--gc-sections -Wl,-S -Wl,@bazel-out/host/bin/tensorflow/cc/ops/lookup_ops_gen_cc-2.params)
/usr/bin/ld: warning: libcuda.so.1, needed by bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Scc_Cops_Slookup_Uops_Ugen_Ucc___Utensorflow/libtensorflow_framework.so, not found (try using -rpath or -rpath-link)
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Scc_Cops_Slookup_Uops_Ugen_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cuMemGetAddressRange_v2'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Scc_Cops_Slookup_Uops_Ugen_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cuMemGetInfo_v2'

I searched a lot on the Internet and tried many solutions, but they didn't help, such as

  1. change python34 installed from epel-release to python 3.5.2 installed from source.
  2. remember to run the following code before first time you run ./configure(if forgot, run bazel clean and re-configure),
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64 
  3. add --action_env="LD_LIBRARY_PATH=${LD_LIBRARY_PATH}" to bazel build command to ensure LD_LIBRARY_PATH was set correctly.

Finally, I tried the code in your dockerfile echo "/usr/local/cuda-8.0/targets/x86_64-linux/lib/stubs" > /etc/ld.so.conf.d/cuda-8.0-stubs.conf && ldconfig and my problem solved.

However, you comment that line in your centos dockerfile. My question is: Why that worked for my problems or why you used that line in your ubuntu dockerfile? If I don't use it, which error will I see when building it? Is it the same to crosstool_wrapper_driver_is_not_gcc failed: error executing command or undefined reference to cuMemGetAddressRange_v2(cubalabala)'

Thank you

ghostplant commented 6 years ago

I just update the Dockerfile for centos7 since a previous one was not tested, thanks for checking.

ghostplant commented 6 years ago

The line echo "/usr/local/cuda-8.0/targets/x86_64-linux/lib/stubs" > /etc/ld.so.conf.d/cuda-8.0-stubs.conf && ldconfig is needed for not only ubuntu but also centos then.

ghostplant commented 6 years ago

I'll release the post-build TF binary for centos7 a short later.

VisionTheta commented 6 years ago

Thanks, I forked your repo and will update my centos 7 dockerfile also.