TomHeaven / tensorflow-osx-build

Off-the-shelf python package of tensorflow with CUDA support for Mac OS.
142 stars 20 forks source link

Tensorflow 1.13.1 build error #10

Closed hwiorn closed 5 years ago

hwiorn commented 5 years ago

I followed the steps in build_instructions_1.10, and the following compile error(exactly Symbol not found: _ncclAllReduce) occurred.

Python 3.7 and tensorflow 1.13.1.

How can I solve this problem?

Execution platform: @bazel_tools//platforms:host_platform
Traceback (most recent call last):
  File "/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/Users/ormak/anaconda3/lib/python3.7/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/Users/ormak/anaconda3/lib/python3.7/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: dlopen(/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so, 6): Symbol not found: _ncclAllReduce
  Referenced from: /private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so
  Expected in: flat namespace
 in /private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/tools/api/generator/create_python_api.py", line 27, in <module>
    from tensorflow.python.tools.api.generator import doc_srcs
  File "/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/Users/ormak/anaconda3/lib/python3.7/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/Users/ormak/anaconda3/lib/python3.7/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: dlopen(/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so, 6): Symbol not found: _ncclAllReduce
  Referenced from: /private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so
  Expected in: flat namespace
 in /private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 16.382s, Critical Path: 13.67s
INFO: 2 processes: 2 local.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
TomHeaven commented 5 years ago

Try adding a flag --config=nonccl as follows

bazel build --config=opt --config=nonccl  //tensorflow/tools/pip_package:build_pip_package
hwiorn commented 5 years ago

Same problem is occurred. Is there something wrong with the Bazel option i set?

bazel shutdown
bazel clean --expunge
bazel -c opt --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0 --action_env PATH --action_env DYLD_LIBRARY_PATH --action_env LD_LIBRARY_PATH --verbose_failures --config=cuda --config=mkl --config=nonccl
/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/Users/ormak/anaconda3/lib/python3.7/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/Users/ormak/anaconda3/lib/python3.7/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: dlopen(/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so, 6): Symbol not found: _ncclAllReduce
  Referenced from: /private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so
  Expected in: flat namespace
 in /private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so

build_tensorflow.log.zip

TomHeaven commented 5 years ago

This is not expected once you add --config=nonccl as a build flag. Do you have a installed version of tensorflow? It may interfere with the building process as you use --action_env DYLD_LIBRARY_PATH. Try uninstall all Tensorflow versions and rebuild.

yrwy commented 5 years ago

DON'T DO THIS STEP export TF_NCCL_VERSION=2.2.13 export NCCL_INSTALL_PATH=/usr/local/nccl

JUST use --config=nonccl

hwiorn commented 5 years ago

@yrwy Thank you. I will try it.

hwiorn commented 5 years ago

Thank you. It worked!