Closed hwiorn closed 5 years ago
Try adding a flag --config=nonccl
as follows
bazel build --config=opt --config=nonccl //tensorflow/tools/pip_package:build_pip_package
Same problem is occurred. Is there something wrong with the Bazel option i set?
bazel shutdown
bazel clean --expunge
bazel -c opt --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0 --action_env PATH --action_env DYLD_LIBRARY_PATH --action_env LD_LIBRARY_PATH --verbose_failures --config=cuda --config=mkl --config=nonccl
/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/Users/ormak/anaconda3/lib/python3.7/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/Users/ormak/anaconda3/lib/python3.7/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: dlopen(/private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so, 6): Symbol not found: _ncclAllReduce
Referenced from: /private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so
Expected in: flat namespace
in /private/var/tmp/_bazel_ormak/b1b58fea2a85dc5b7cee7637f479189d/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/create_tensorflow.python_api_1_tf_python_api_gen_v1.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so
This is not expected once you add --config=nonccl
as a build flag. Do you have a installed version of tensorflow? It may interfere with the building process as you use --action_env DYLD_LIBRARY_PATH
. Try uninstall all Tensorflow versions and rebuild.
DON'T DO THIS STEP
export TF_NCCL_VERSION=2.2.13
export NCCL_INSTALL_PATH=/usr/local/nccl
JUST use --config=nonccl
@yrwy Thank you. I will try it.
Thank you. It worked!
I followed the steps in build_instructions_1.10, and the following compile error(exactly
Symbol not found: _ncclAllReduce
) occurred.Python 3.7 and tensorflow 1.13.1.
How can I solve this problem?