Closed qppp558 closed 5 years ago
It seems that I do not assign the $CUDA_HOME when building the package.
I assign $CUDA_HOME, but have the same problem
same error, python 3.6.0, cuda 9.0,
I assign $CUDA_HOME, but have the same problem
So, How did you solve it?
OS: Ubuntu 16.04.3 LTS CUDA version: 9.0 GPU: Tesla P100
I built and installed the tensorflow-bind and it seems no error. However, when I am trying the unit test by running
python setup.py test
, it failed with the following information:setup.py:63: UserWarning: Assuming tensorflow was compiled without C++11 ABI. It is generally true if you are using binary pip package. If you compiled tensorflow from source with gcc >= 5 and didn't set -D_GLIBCXX_USE_CXX11_ABI=0 during compilation, you need to set environment variable TF_CXX11_ABI=1 when compiling this bindings. Also be sure to touch some files in src to trigger recompilation. Also, you need to set (or unsed) this environment variable if getting undefined symbol: _ZN10tensorflow... errors warnings.warn("Assuming tensorflow was compiled without C++11 ABI. " running test running egg_info writing warprnnt_tensorflow.egg-info/PKG-INFO writing top-level names to warprnnt_tensorflow.egg-info/top_level.txt writing dependency_links to warprnnt_tensorflow.egg-info/dependency_links.txt reading manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt' writing manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt' running build_ext copying build/lib.linux-x86_64-3.5/warprnnt_tensorflow/kernels.cpython-35m-x86_64-linux-gnu.so -> warprnnt_tensorflow /data/gengjie/workspace/warp-transducer/tensorflow_binding/setup.py:63: UserWarning: Assuming tensorflow was compiled without C++11 ABI. It is generally true if you are using binary pip package. If you compiled tensorflow from source with gcc >= 5 and didn't set -D_GLIBCXX_USE_CXX11_ABI=0 during compilation, you need to set environment variable TF_CXX11_ABI=1 when compiling this bindings. Also be sure to touch some files in src to trigger recompilation. Also, you need to set (or unsed) this environment variable if getting undefined symbol: _ZN10tensorflow... errors warnings.warn("Assuming tensorflow was compiled without C++11 ABI. " running test running egg_info writing warprnnt_tensorflow.egg-info/PKG-INFO writing top-level names to warprnnt_tensorflow.egg-info/top_level.txt writing dependency_links to warprnnt_tensorflow.egg-info/dependency_links.txt reading manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt' writing manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt' running build_ext copying build/lib.linux-x86_64-3.5/warprnnt_tensorflow/kernels.cpython-35m-x86_64-linux-gnu.so -> warprnnt_tensorflow 2019-07-16 08:09:39.373288: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-07-16 08:09:39.811372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:22:00.0 totalMemory: 15.90GiB freeMemory: 15.34GiB 2019-07-16 08:09:39.811462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-07-16 08:09:40.169127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-16 08:09:40.169209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-07-16 08:09:40.169237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-07-16 08:09:40.169744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14862 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:22:00.0, compute capability: 6.0) [4.280653 3.938437] [array([[[[-1.86843961e-01, -6.25548363e-02, 2.49398738e-01], [-2.03376651e-01, 2.02399358e-01, 9.77352262e-04], [-1.41016066e-01, 7.91234598e-02, 6.18926175e-02]],
test_forward (test_warprnnt_op.WarpRNNTTest) ... 2019-07-16 08:09:40.466216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-07-16 08:09:40.466285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-16 08:09:40.466299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-07-16 08:09:40.466309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-07-16 08:09:40.466501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14862 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:22:00.0, compute capability: 6.0) [4.4956665] ok test_multiple_batches_cpu (test_warprnnt_op.WarpRNNTTest) ... /data/gengjie/workspace/warp-transducer/tensorflow_binding/tests/test_warprnnt_op.py:14: DeprecationWarning: Please use assertEqual instead. self.assertEquals(acts.shape, expected_grads.shape) 2019-07-16 08:09:40.505122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-07-16 08:09:40.505221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-16 08:09:40.505236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-07-16 08:09:40.505245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-07-16 08:09:40.505559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14862 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:22:00.0, compute capability: 6.0) ok test_multiple_batches_gpu (test_warprnnt_op.WarpRNNTTest) ... 2019-07-16 08:09:40.522426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-07-16 08:09:40.522482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-16 08:09:40.522506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-07-16 08:09:40.522522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-07-16 08:09:40.522782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 14862 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:22:00.0, compute capability: 6.0) 2019-07-16 08:09:40.538106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-07-16 08:09:40.538146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-16 08:09:40.538159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-07-16 08:09:40.538169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-07-16 08:09:40.538347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14862 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:22:00.0, compute capability: 6.0) FAIL test_session (test_warprnnt_op.WarpRNNTTest) Use cached_session instead. ... ok
====================================================================== FAIL: test_multiple_batches_gpu (test_warprnnt_op.WarpRNNTTest)
Traceback (most recent call last): File "/data/gengjie/workspace/warp-transducer/tensorflow_binding/tests/test_warprnnt_op.py", line 92, in test_multiple_batches_gpu self._test_multiple_batches(use_gpu=True) File "/data/gengjie/workspace/warp-transducer/tensorflow_binding/tests/test_warprnnt_op.py", line 85, in _test_multiple_batches self._run_rnnt(acts, labels, input_lengths, label_lengths, expected_costs, expected_grads, 0, use_gpu) File "/data/gengjie/workspace/warp-transducer/tensorflow_binding/tests/test_warprnnt_op.py", line 27, in _run_rnnt self.assertAllClose(tf_costs, expected_costs, atol=1e-6) File "/data/gengjie/env/lib/python3.5/site-packages/tensorflow/python/framework/test_util.py", line 1591, in assertAllClose self._assertAllCloseRecursive(a, b, rtol=rtol, atol=atol, msg=msg) File "/data/gengjie/env/lib/python3.5/site-packages/tensorflow/python/framework/test_util.py", line 1561, in _assertAllCloseRecursive (path_str, path_str, msg))) File "/data/gengjie/env/lib/python3.5/site-packages/tensorflow/python/framework/test_util.py", line 1496, in _assertArrayLikeAllClose a, b, rtol=rtol, atol=atol, err_msg="\n".join(msgs), equal_nan=True) File "/data/gengjie/env/lib/python3.5/site-packages/numpy/testing/_private/utils.py", line 1501, in assert_allclose verbose=verbose, header=header, equal_nan=equal_nan) File "/data/gengjie/env/lib/python3.5/site-packages/numpy/testing/_private/utils.py", line 827, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=1e-06, atol=1e-06 Mismatched value: a is different from b. not close where = (array([0, 1]),) not close lhs = [-5.3799906 -5.5812006] not close rhs = [4.28065 3.93844] not close dif = [9.660641 9.519641] not close tol = [5.28065e-06 4.93844e-06] dtype = float32, shape = (2,) Mismatch: 100% Max absolute difference: 9.660641 Max relative difference: 2.4171095 x: array([-5.379991, -5.581201], dtype=float32) y: array([4.28065, 3.93844], dtype=float32)
Ran 4 tests in 0.095s
FAILED (failures=1)