TomHeaven / tensorflow-osx-build

Off-the-shelf python package of tensorflow with CUDA support for Mac OS.
142 stars 20 forks source link

Cannot find cuda library libcublas.10.1.dylib #7

Closed yangliuyu closed 5 years ago

yangliuyu commented 5 years ago

My env is: Python 3.7 CUDA 10.1 with cuDNN 7.5 Since Tensorflow 1.13.1 support python 3.7, @TomHeaven could you please release a tensorflow-1.13.1-py27-py36-py37-cuda10.1-cudnn75? Got error when building tensorflow on my machine following:

(base) ➜  tensorflow git:(6612da8951) bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
INFO: Invocation ID: e2e443d2-5393-4644-a4d6-42c8a9d28193
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': error loading package 'tensorflow/tools/pip_package': in /Users/yangliu/Documents/workspace/tensorflow/tensorflow/tensorflow.bzl: Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
    File "/Users/yangliu/Documents/workspace/tensorflow/third_party/gpus/cuda_configure.bzl", line 1556
        _create_local_cuda_repository(repository_ctx)
    File "/Users/yangliu/Documents/workspace/tensorflow/third_party/gpus/cuda_configure.bzl", line 1302, in _create_local_cuda_repository
        _find_libs(repository_ctx, cuda_config)
    File "/Users/yangliu/Documents/workspace/tensorflow/third_party/gpus/cuda_configure.bzl", line 840, in _find_libs
        _find_cuda_lib("cublas", repository_ctx, cpu_value, c..., ...)
    File "/Users/yangliu/Documents/workspace/tensorflow/third_party/gpus/cuda_configure.bzl", line 752, in _find_cuda_lib
        auto_configure_fail(("Cannot find cuda library %s" %...))
    File "/Users/yangliu/Documents/workspace/tensorflow/third_party/gpus/cuda_configure.bzl", line 342, in auto_configure_fail
        fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: Cannot find cuda library libcublas.10.1.dylib
WARNING: Target pattern parsing failed.
ERROR: error loading package 'tensorflow/tools/pip_package': in /Users/yangliu/Documents/workspace/tensorflow/tensorflow/tensorflow.bzl: Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
    File "/Users/yangliu/Documents/workspace/tensorflow/third_party/gpus/cuda_configure.bzl", line 1556
        _create_local_cuda_repository(repository_ctx)
    File "/Users/yangliu/Documents/workspace/tensorflow/third_party/gpus/cuda_configure.bzl", line 1302, in _create_local_cuda_repository
        _find_libs(repository_ctx, cuda_config)
    File "/Users/yangliu/Documents/workspace/tensorflow/third_party/gpus/cuda_configure.bzl", line 840, in _find_libs
        _find_cuda_lib("cublas", repository_ctx, cpu_value, c..., ...)
    File "/Users/yangliu/Documents/workspace/tensorflow/third_party/gpus/cuda_configure.bzl", line 752, in _find_cuda_lib
        auto_configure_fail(("Cannot find cuda library %s" %...))
    File "/Users/yangliu/Documents/workspace/tensorflow/third_party/gpus/cuda_configure.bzl", line 342, in auto_configure_fail
        fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: Cannot find cuda library libcublas.10.1.dylib
INFO: Elapsed time: 0.278s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
    currently loading: tensorflow/tools/pip_package
    Fetching @local_config_cuda; fetching

I disable the sip to make LD_LIBRARY_PATH work. some export in my .zshrc file:

export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH="$CUDA_HOME/lib:$CUDA_HOME/nvvm/lib:$CUDA_HOME/extras/CUPTI/lib:/usr/local/nccl/lib"
export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH
export PATH=$CUDA_HOME/bin:$PATH

Only libcublas.10.dylib exist under /Developer/NVIDIA/CUDA-10.1/lib. After I make a link file libcublas.10.1.dylib to libcublas.10.dylib, got the same error "Cannot find cuda library libcublas.10.1.dylib"

TomHeaven commented 5 years ago

I'm using CUDA10 and CUDNN 7.4 now and suggest u to do the same since many existing projects may not be compatible with the latest version of CUDA. In fact, a lot of projects still rely on CUDA 9 so I have to keep CUDA 9 as a part of my runtime library.

I will try compiling TF 1.13.1 using CUDA10 and CUDNN 7.4. However, it will take some time.

TomHeaven commented 5 years ago

Please try the latest release v1.13.1_cu100 to to see if it works for you.

yangliuyu commented 5 years ago

@TomHeaven you are awesome man, thanks a lot.

In [1]: import tensorflow as tf

In [2]: sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-03-26 19:57:53.201805: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:959] OS X does not support NUMA - returning NUMA node zero
2019-03-26 19:57:53.201926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1070 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 5.55GiB
2019-03-26 19:57:53.201938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-26 19:57:53.456242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-26 19:57:53.456257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-03-26 19:57:53.456261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-03-26 19:57:53.456321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5316 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
2019-03-26 19:57:53.456863: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1

So far so good with v1.13.1_cu100 CUDA 10.0 and cuDNN 7.4. I think "1 memoryClockRate(GHz): 1.683" should be the core rate, the memory clock rate should be 8GHz according the specs here I compiled the tensorflow 1.13.1 with CUDA 10.1 and cuDNN 7.5 just now, the process stuck at finding nccl so files even though I created the dylib link files according to your building instruction. Is there any way to disable nccl when compiling?

TomHeaven commented 5 years ago

You can disable NCCL by passing an extra parameter --config=nonccl. So the command looks like this:

bazel build --config=opt --config=nonccl //tensorflow/tools/pip_package:build_pip_package
yangliuyu commented 5 years ago

@TomHeaven Thanks a lot