Hi,bro!Need help with xla_compile.

yrwy commented 4 years ago

Hi Tom, 又来骚扰你了，我印象中是1.3之后就无法使用XLA了，一旦使用就出现报错，无法找到设备。现在tf2.2默认开启了XLA的编译选项。我跑TF官方的benchmark会报错，因为TF在macOS下的GPU支持已经被放弃了，只能来骚扰你了，看下你有没有碰过这种情况。

具体如下： https://github.com/tensorflow/benchmarks 下载 benchmarks-cnn_tf_v2.1_compatible

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server --xla_compile=True

2020-05-30 15:31:34.302936: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at xla_ops.cc:368 : Not found: could not find registered platform with id: 0x141f70298 This error might be occurring with the use of xla.compile. If it is not necessary that every Op be compiled with XLA, an alternative is to use auto_jit with OptimizerOptions.global_jit_level = ON_2 or the environment variable TF_XLA_FLAGS="tf_xla_auto_jit=2" which will attempt to use xla to compile as much of the graph as the compiler is able to. INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, 2 root error(s) found. (0) Not found: could not find registered platform with id: 0x141f70298 This error might be occurring with the use of xla.compile. If it is not necessary that every Op be compiled with XLA, an alternative is to use auto_jit with OptimizerOptions.global_jit_level = ON_2 or the environment variable TF_XLA_FLAGS="tf_xla_auto_jit=2" which will attempt to use xla to compile as much of the graph as the compiler is able to. [[{{node tower_0/v/cluster}}]] [[main_fetch_group/_566]] (1) Not found: could not find registered platform with id: 0x141f70298 This error might be occurring with the use of xla.compile. If it is not necessary that every Op be compiled with XLA, an alternative is to use auto_jit with OptimizerOptions.global_jit_level = ON_2 or the environment variable TF_XLA_FLAGS="tf_xla_auto_jit=2" which will attempt to use xla to compile as much of the graph as the compiler is able to. [[{{node tower_0/v/cluster}}]] 0 successful operations. 0 derived errors ignored.

去掉 --xla_compile=True 这个参数是可以跑的起来的。

TomHeaven commented 4 years ago

Try release of v2.2. Here is my test result:

 python3 tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server --xla_compile=True 
2020-05-31 20:59:59.454636: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.10.0.dylib
WARNING: Logging before flag parsing goes to stderr.
W0531 21:00:02.565242 140735744557952 deprecation.py:323] From /usr/local/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-05-31 21:00:02.632062: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.dylib
2020-05-31 21:00:02.639004: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:943] OS X does not support NUMA - returning NUMA node zero
2020-05-31 21:00:02.639154: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1544] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: TITAN V computeCapability: 7.0
coreClock: 1.455GHz coreCount: 80 deviceMemorySize: 12.00GiB deviceMemoryBandwidth: 607.97GiB/s
2020-05-31 21:00:02.639501: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.10.0.dylib
2020-05-31 21:00:02.660028: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.10.0.dylib
2020-05-31 21:00:02.679666: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.10.0.dylib
2020-05-31 21:00:02.680897: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.10.0.dylib
2020-05-31 21:00:02.744991: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.10.0.dylib
2020-05-31 21:00:02.765552: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.10.0.dylib
2020-05-31 21:00:02.787741: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.7.dylib
2020-05-31 21:00:02.787873: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:943] OS X does not support NUMA - returning NUMA node zero
2020-05-31 21:00:02.788137: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:943] OS X does not support NUMA - returning NUMA node zero
2020-05-31 21:00:02.788250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1686] Adding visible gpu devices: 0
2020-05-31 21:00:03.547819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1085] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-31 21:00:03.547838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1091]      0 
2020-05-31 21:00:03.547842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1104] 0:   N 
2020-05-31 21:00:03.548349: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:943] OS X does not support NUMA - returning NUMA node zero
2020-05-31 21:00:03.548599: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:943] OS X does not support NUMA - returning NUMA node zero
2020-05-31 21:00:03.548862: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:943] OS X does not support NUMA - returning NUMA node zero
2020-05-31 21:00:03.549003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1230] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5444 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:01:00.0, compute capability: 7.0)
2020-05-31 21:00:03.557051: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fcf6fc35ad0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-31 21:00:03.557072: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): TITAN V, Compute Capability 7.0
2020-05-31 21:00:03.568181: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fcf76a14ea0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-31 21:00:03.568199: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
TensorFlow:  2.2
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  64 global
             64 per device
Num batches: 100
Num epochs:  0.00
Devices:     ['/gpu:0']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   parameter_server
==========
Generating training model
W0531 21:00:03.581869 140735744557952 deprecation.py:323] From /Volumes/Data/libraries/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:134: conv2d (from tensorflow.python.keras.legacy_tf_layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
W0531 21:00:03.586750 140735744557952 deprecation.py:323] From /usr/local/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/convolutional.py:424: Layer.apply (from tensorflow.python.keras.engine.base_layer_v1) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0531 21:00:03.621324 140735744557952 deprecation.py:323] From /Volumes/Data/libraries/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:266: max_pooling2d (from tensorflow.python.keras.legacy_tf_layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.MaxPooling2D instead.
Initializing graph
W0531 21:00:06.072226 140735744557952 deprecation.py:323] From /Volumes/Data/libraries/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2268: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2020-05-31 21:00:06.395814: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:943] OS X does not support NUMA - returning NUMA node zero
2020-05-31 21:00:06.396008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1544] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: TITAN V computeCapability: 7.0
coreClock: 1.455GHz coreCount: 80 deviceMemorySize: 12.00GiB deviceMemoryBandwidth: 607.97GiB/s
2020-05-31 21:00:06.396435: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.10.0.dylib
2020-05-31 21:00:06.396737: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.10.0.dylib
2020-05-31 21:00:06.397049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.10.0.dylib
2020-05-31 21:00:06.397340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.10.0.dylib
2020-05-31 21:00:06.397631: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.10.0.dylib
2020-05-31 21:00:06.397920: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.10.0.dylib
2020-05-31 21:00:06.398186: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.7.dylib
2020-05-31 21:00:06.398295: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:943] OS X does not support NUMA - returning NUMA node zero
2020-05-31 21:00:06.398555: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:943] OS X does not support NUMA - returning NUMA node zero
2020-05-31 21:00:06.398670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1686] Adding visible gpu devices: 0
2020-05-31 21:00:06.398693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1085] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-31 21:00:06.398698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1091]      0 
2020-05-31 21:00:06.398702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1104] 0:   N 
2020-05-31 21:00:06.398846: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:943] OS X does not support NUMA - returning NUMA node zero
2020-05-31 21:00:06.399059: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:943] OS X does not support NUMA - returning NUMA node zero
2020-05-31 21:00:06.399178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1230] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5444 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:01:00.0, compute capability: 7.0)
I0531 21:00:07.086311 140735744557952 session_manager.py:505] Running local_init_op.
I0531 21:00:07.158162 140735744557952 session_manager.py:508] Done running local_init_op.
Running warm up
2020-05-31 21:00:08.948844: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.10.0.dylib
2020-05-31 21:00:09.161863: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.7.dylib
2020-05-31 21:00:22.647080: I tensorflow/compiler/jit/xla_compilation_cache.cc:314] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
Done warm up
Step    Img/sec total_loss
1   images/sec: 354.3 +/- 0.0 (jitter = 0.0)    7.608
10  images/sec: 350.1 +/- 1.0 (jitter = 4.2)    7.849
20  images/sec: 350.6 +/- 0.6 (jitter = 3.9)    8.013
30  images/sec: 350.5 +/- 0.6 (jitter = 3.8)    7.939
40  images/sec: 347.0 +/- 1.7 (jitter = 5.0)    8.135
50  images/sec: 347.9 +/- 1.4 (jitter = 4.4)    8.051
60  images/sec: 348.7 +/- 1.2 (jitter = 3.1)    7.786
70  images/sec: 349.4 +/- 1.0 (jitter = 1.7)    7.857
80  images/sec: 349.9 +/- 0.9 (jitter = 1.4)    8.007
90  images/sec: 350.3 +/- 0.8 (jitter = 1.2)    7.842
100 images/sec: 349.2 +/- 0.9 (jitter = 1.6)    8.088
----------------------------------------------------------------
total images/sec: 349.07
----------------------------------------------------------------

yrwy commented 4 years ago

可能是CUDA10.1的问题了，我重新编译一个试试 -。-

yrwy commented 4 years ago

见鬼了，降级到cuda10之后编译出来的还是一样，但是用你编译的一点儿问题都没有。除了python环境我用的是anaconda，其他的都跟你一样。

TomHeaven commented 4 years ago

@yrwy 你用CUDA 10.1编译TF2.2的时候是否遇到如下问题：

ERROR: /Volumes/Data/github/tensorflow/tensorflow/stream_executor/cuda/BUILD:448:1: C++ compilation of rule '//tensorflow/stream_executor/cuda:cusparse_stub' failed (Exit 1)
In file included from tensorflow/stream_executor/cuda/cusparse_stub.cc:59:
./tensorflow/stream_executor/cuda/cusparse_10_1.inc:7786:21: error: unknown type name 'cusparseSpVecDescr_t'
cusparseCreateSpVec(cusparseSpVecDescr_t *spVecDescr, int64_t size, int64_t nnz,
                    ^
./tensorflow/stream_executor/cuda/cusparse_10_1.inc:7790:7: error: unknown type name 'cusparseSpVecDescr_t'; did you mean 'cusparseSpMatDescr_t'?
      cusparseSpVecDescr_t *, int64_t, int64_t, void *, void *,
      ^~~~~~~~~~~~~~~~~~~~
      cusparseSpMatDescr_t
bazel-out/host/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual/third_party/gpus/cuda/include/cusparse.h:6964:36: note: 'cusparseSpMatDescr_t' declared here
typedef struct cusparseSpMatDescr* cusparseSpMatDescr_t;
                                   ^
In file included from tensorflow/stream_executor/cuda/cusparse_stub.cc:59:
./tensorflow/stream_executor/cuda/cusparse_10_1.inc:7799:22: error: unknown type name 'cusparseSpVecDescr_t'; did you mean 'cusparseSpMatDescr_t'?
cusparseDestroySpVec(cusparseSpVecDescr_t spVecDescr) {
                     ^~~~~~~~~~~~~~~~~~~~
                     cusparseSpMatDescr_t
bazel-out/host/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual/third_party/gpus/cuda/include/cusparse.h:6964:36: note: 'cusparseSpMatDescr_t' declared here
typedef struct cusparseSpMatDescr* cusparseSpMatDescr_t;
                                   ^
In file included from tensorflow/stream_executor/cuda/cusparse_stub.cc:59:
./tensorflow/stream_executor/cuda/cusparse_10_1.inc:7800:51: error: unknown type name 'cusparseSpVecDescr_t'; did you mean 'cusparseSpMatDescr_t'?
  using FuncPtr = cusparseStatus_t(CUSPARSEAPI *)(cusparseSpVecDescr_t);
                                                  ^~~~~~~~~~~~~~~~~~~~
                                                  cusparseSpMatDescr_t
bazel-out/host/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual/third_party/gpus/cuda/include/cusparse.h:6964:36: note: 'cusparseSpMatDescr_t' declared here
typedef struct cusparseSpMatDescr* cusparseSpMatDescr_t;
                                   ^

yrwy commented 4 years ago

没有见过这个问题，我用的xcode 10.1。用了你的patch之后除了要手动改下absl，其他都没问题就是xla会认不到。

yrwy commented 4 years ago

@TomHeaven 能把你的环境变量给我看下吗，我真奇怪为什么我编译会,XLA会找不到设备。先谢了。

TomHeaven commented 4 years ago

环境变量如下：

iMac13:cuda tomheaven$ env
TERM_PROGRAM=Apple_Terminal
DYLD_FALLBACK_LIBRARY_PATH=/usr/local:/usr/local/cuda9/lib:/usr/local/cuda/lib:/usr/local/lib:/Library/Python/2.7/site-packages/tensorflow:
SHELL=/bin/bash
TERM=xterm-256color
TMPDIR=/var/folders/xf/sbm5h3410gq9vgp1lksnh7xw0000gn/T/
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.UgyJOZIMM3/Render
TERM_PROGRAM_VERSION=404.1
CUDA_INC_DIR=/usr/local/cuda/include
OLDPWD=/Users/tomheaven
TERM_SESSION_ID=4BC09B9C-7BFD-4B46-AC91-8D1C533B2852
LC_ALL=en_US.UTF-8
USER=tomheaven
LD_LIBRARY_PATH=/usr/local/lib/lua/5.1:/usr/local/lib/lua/5.2:/usr/local/cudnn5/lib:
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.lswMNX3rum/Listeners
PATH=/usr/local/sbin:/usr/local/cuda/bin:/Users/tomheaven/Documents/caffe-master/build/install/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Library/TeX/texbin:/Applications/Wireshark.app/Contents/MacOS
C_INCLUDE_PATH=/usr/local/include:/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include:/usr/local/cuda/include
PWD=/usr/local/cuda
LUA_PATH=/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua;;
LANG=en_US.UTF-8
LUA_CPATH=/usr/local/lib/lua/5.1/?.so;;
XPC_FLAGS=0x0
XPC_SERVICE_NAME=0
SHLVL=1
HOME=/Users/tomheaven
DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/nccl/lib::/Library/Python/2.7/site-packages/tensorflow
PYTHONPATH=/Users/tomheaven/Documents/caffe-master/python:
LOGNAME=tomheaven
CXX_INCLUDE_PATH=/usr/local/include:/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include:/usr/local/cuda/include
DISPLAY=/private/tmp/com.apple.launchd.e9gi25auA6/org.macosforge.xquartz:0
_=/usr/bin/env

我用CUDA10.1 + XCode 10.1编译TF2.2会有Cusparse那个问题。换了两台机器都是一样的。不知道你为什么没有问题。

yrwy commented 4 years ago

我怀疑是这个问题

DYLD_FALLBACK_LIBRARY_PATH=/usr/local:/usr/local/cuda9/lib:/usr/local/cuda/lib:/usr/local/lib:/Library/Python/2.7/site-packages/tensorflow: C_INCLUDE_PATH=/usr/local/include:/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include:/usr/local/cuda/include CXX_INCLUDE_PATH=/usr/local/include:/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include:/usr/local/cuda/include

是不是cuda9你也包含进去了，还有我的xcode SDK 是MacOSX10.14.sdk。好头疼...

yrwy commented 4 years ago

最新的cuda framework是有新版macos的驱动的，但是nvidia并没有释出来。CUDA10.1他的说明是可以用Xcode10.2编译。

/Library/Frameworks/CUDA.framework/Versions/A/Libraries libcuda_418.15.10_mercury.dylib 应该是对应的驱动版本但是没有见过

TomHeaven commented 4 years ago

CUDA9这个目录已经被我删除了，是历史遗留下来的，没有作用。方便的话可以email我你的微信二维码，我想详细问一下CUDA10.1下你是怎么编译过TF2.2的。我那个cusparse错误绕不过去。

yrwy commented 4 years ago

直接加 yrwy1982

TomHeaven commented 4 years ago

又测了一个CUDA10.1+CUDNN7.6.5的1080Ti:

python3  tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server --xla_compile=True
2020-06-16 09:50:57.440574: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.10.1.dylib
WARNING: Logging before flag parsing goes to stderr.
W0616 09:50:59.839060 140736053945216 deprecation.py:323] From /usr/local/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-06-16 09:50:59.948163: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.dylib
2020-06-16 09:51:00.028161: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:942] OS X does not support NUMA - returning NUMA node zero
2020-06-16 09:51:00.028379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.645GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-06-16 09:51:00.028562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.10.1.dylib
2020-06-16 09:51:00.128385: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.10.dylib
2020-06-16 09:51:00.198901: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.10.dylib
2020-06-16 09:51:00.221266: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.10.dylib
2020-06-16 09:51:00.326854: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.10.dylib
2020-06-16 09:51:00.381549: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.10.dylib
2020-06-16 09:51:00.496866: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.7.dylib
2020-06-16 09:51:00.497052: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:942] OS X does not support NUMA - returning NUMA node zero
2020-06-16 09:51:00.497469: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:942] OS X does not support NUMA - returning NUMA node zero
2020-06-16 09:51:00.497699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-06-16 09:51:01.137055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-16 09:51:01.137077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-06-16 09:51:01.137081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-06-16 09:51:01.137326: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:942] OS X does not support NUMA - returning NUMA node zero
2020-06-16 09:51:01.137600: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:942] OS X does not support NUMA - returning NUMA node zero
2020-06-16 09:51:01.137834: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:942] OS X does not support NUMA - returning NUMA node zero
2020-06-16 09:51:01.137985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8002 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-06-16 09:51:01.146234: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f9a398abff0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-16 09:51:01.146258: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-06-16 09:51:01.156092: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f9a36daea60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-16 09:51:01.156119: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
TensorFlow:  2.2
Model:       resnet50
Dataset:     imagenet (synthetic)
Mode:        training
SingleSess:  False
Batch size:  64 global
             64 per device
Num batches: 100
Num epochs:  0.00
Devices:     ['/gpu:0']
NUMA bind:   False
Data format: NCHW
Optimizer:   sgd
Variables:   parameter_server
==========
Generating training model
W0616 09:51:01.177154 140736053945216 deprecation.py:323] From /Volumes/Data/library/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:134: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
W0616 09:51:01.181925 140736053945216 deprecation.py:323] From /usr/local/lib/python3.7/site-packages/tensorflow/python/layers/convolutional.py:424: Layer.apply (from tensorflow.python.keras.engine.base_layer_v1) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0616 09:51:01.187433 140736053945216 deprecation.py:506] From /usr/local/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:1666: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0616 09:51:01.217278 140736053945216 deprecation.py:323] From /Volumes/Data/library/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:266: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.MaxPooling2D instead.
Initializing graph
W0616 09:51:03.815936 140736053945216 deprecation.py:323] From /Volumes/Data/library/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2268: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2020-06-16 09:51:04.098956: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:942] OS X does not support NUMA - returning NUMA node zero
2020-06-16 09:51:04.099159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.645GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-06-16 09:51:04.099391: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.10.1.dylib
2020-06-16 09:51:04.099541: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.10.dylib
2020-06-16 09:51:04.099666: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.10.dylib
2020-06-16 09:51:04.099788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.10.dylib
2020-06-16 09:51:04.099930: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.10.dylib
2020-06-16 09:51:04.100050: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.10.dylib
2020-06-16 09:51:04.100191: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.7.dylib
2020-06-16 09:51:04.100372: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:942] OS X does not support NUMA - returning NUMA node zero
2020-06-16 09:51:04.100613: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:942] OS X does not support NUMA - returning NUMA node zero
2020-06-16 09:51:04.100741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-06-16 09:51:04.100771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-16 09:51:04.100786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-06-16 09:51:04.100790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-06-16 09:51:04.100965: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:942] OS X does not support NUMA - returning NUMA node zero
2020-06-16 09:51:04.101202: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:942] OS X does not support NUMA - returning NUMA node zero
2020-06-16 09:51:04.101337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8002 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
I0616 09:51:04.831177 140736053945216 session_manager.py:505] Running local_init_op.
I0616 09:51:04.933616 140736053945216 session_manager.py:508] Done running local_init_op.
Running warm up
2020-06-16 09:51:06.680705: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.10.dylib
2020-06-16 09:51:07.345785: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.7.dylib
2020-06-16 09:51:19.267932: I tensorflow/compiler/jit/xla_compilation_cache.cc:241] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
Done warm up
Step    Img/sec total_loss
1   images/sec: 240.0 +/- 0.0 (jitter = 0.0)    7.608
10  images/sec: 241.3 +/- 0.3 (jitter = 0.8)    7.849
20  images/sec: 241.3 +/- 0.2 (jitter = 0.7)    8.013
30  images/sec: 241.2 +/- 0.2 (jitter = 0.9)    7.940
40  images/sec: 241.0 +/- 0.2 (jitter = 0.9)    8.136
50  images/sec: 240.9 +/- 0.2 (jitter = 0.9)    8.052
60  images/sec: 240.9 +/- 0.2 (jitter = 0.9)    7.783
70  images/sec: 240.9 +/- 0.2 (jitter = 0.8)    7.852
80  images/sec: 241.0 +/- 0.1 (jitter = 0.8)    8.011
90  images/sec: 241.0 +/- 0.1 (jitter = 0.8)    7.842
100 images/sec: 241.0 +/- 0.1 (jitter = 0.8)    8.089
----------------------------------------------------------------
total images/sec: 240.91
----------------------------------------------------------------

TomHeaven / tensorflow-osx-build

Hi,bro!Need help with xla_compile. #18