Failed to use tf.profiler.experimental.Profile API to profiler model

alphaRGB commented 4 years ago

System information

OS Platform: Ubuntu 18.04
TensorFlow installed from source
TensorFlow version: tf==2.1.0, merge-200413-0-g129dd9a34e 2.1.0
Rocm version: rocm==3.3.0
Python version: python==3.6.9
Bazel version (if compiling from source):
GCC/Compiler version: 7.5.0
GPU model and memory: Vega 20, 16GB

Describe the current behavior I want to profile a CNN model on AMD GPU, my model is implement by tf.keras API, but get error profiler output: Here is my profiler test code: （This works ok and profile correct result using NVIDIA GPU）

import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras import Model
import os

# os.environ['CUDA_VISIBLE_DEVICES'] = '0'

class MyModel(Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = Conv2D(filters=512, kernel_size=3, activation='relu')
        self.conv2 = Conv2D(filters=256, kernel_size=3, activation='relu')
        self.conv3 = Conv2D(filters=256, kernel_size=7, activation='relu')
        self.flatten = Flatten()
        self.d1 = Dense(units=128, activation='relu')
        self.d2 = Dense(units=10, activation='softmax')

    @tf.function
    def call(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.flatten(x)
        x = self.d1(x)
        return self.d2(x)

model = MyModel()
input_shape = [1, 64, 64, 3]
# model = MyModel()
model(tf.ones(input_shape))

with tf.device('/GPU:0'):
       with tf.profiler.experimental.Profile(logdir='temp'):
           outs = model(tf.ones(input_shape))

print('Profile v2 done!')

After execute above code, Tensorflow print many errors: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id., It seems that many Op failed to trace. So I want to know how to profile model on AMD GPU with tensorflow>=2.1.0 ? many thank.

tf outputs:

python3.6 test_profile.py 
2020-07-06 13:19:13.492291: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
2020-07-06 13:19:13.580602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:1c:00.0 name: Vega 20     ROCm AMD GPU ISA: gfx906
coreClock: 1.801GHz coreCount: 60 deviceMemorySize: 15.98GiB deviceMemoryBandwidth: 953.67GiB/s
2020-07-06 13:19:13.631531: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2020-07-06 13:19:13.632684: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
2020-07-06 13:19:13.635016: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
2020-07-06 13:19:13.635192: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
2020-07-06 13:19:13.635262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1685] Adding visible gpu devices: 0
2020-07-06 13:19:13.635525: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance-critical operations:  AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-07-06 13:19:13.640835: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3500000000 Hz
2020-07-06 13:19:13.642456: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4d73780 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-06 13:19:13.642503: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-06 13:19:13.645587: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x73311c0 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
2020-07-06 13:19:13.645622: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Vega 20, AMDGPU ISA version: gfx906
2020-07-06 13:19:13.645916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:1c:00.0 name: Vega 20     ROCm AMD GPU ISA: gfx906
coreClock: 1.801GHz coreCount: 60 deviceMemorySize: 15.98GiB deviceMemoryBandwidth: 953.67GiB/s
2020-07-06 13:19:13.645969: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2020-07-06 13:19:13.645995: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
2020-07-06 13:19:13.646018: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
2020-07-06 13:19:13.646040: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
2020-07-06 13:19:13.646123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1685] Adding visible gpu devices: 0
2020-07-06 13:19:13.646146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-06 13:19:13.646160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1090]      0 
2020-07-06 13:19:13.646172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] 0:   N 
2020-07-06 13:19:13.646305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1229] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15145 MB memory) -> physical GPU (device: 0, name: Vega 20, pci bus id: 0000:1c:00.0)
2020-07-06 13:19:19.304584: I tensorflow/core/graph/gpu_fusion_pass.cc:505] ROCm Fusion is enabled.
2020-07-06 13:19:19.306426: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2020-07-06 13:19:19.307372: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
MIOpen(HIP): Warning [SQLiteBase] Unable to read system database file:/opt/rocm-3.3.0/miopen/share/miopen/db/miopen.db Performance may degrade
MIOpen(HIP): Warning [FindDataDirectSolutions] /root/driver/MLOpen/src/include/miopen/sqlite_db.hpp:209: Internal error while accessing SQLite database: unable to open database file
MIOpen(HIP): Warning [FindDataDirectSolutions] /root/driver/MLOpen/src/include/miopen/sqlite_db.hpp:209: Internal error while accessing SQLite database: unable to open database file
MIOpen(HIP): Warning [FindDataDirectSolutions] /root/driver/MLOpen/src/include/miopen/sqlite_db.hpp:209: Internal error while accessing SQLite database: unable to open database file
2020-07-06 13:19:27.832672: I tensorflow/core/profiler/lib/profiler_session.cc:154] Profiler session started.
2020-07-06 13:19:27.832742: I tensorflow/core/profiler/internal/gpu/rocm_tracer.cc:743] Profiler found 1 GPUs
2020-07-06 13:19:27.832763: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:583] GpuTracer created.
2020-07-06 13:19:27.836269: I tensorflow/core/profiler/internal/gpu/rocm_tracer.cc:757] GpuTracer started
2020-07-06 13:19:27.839595: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : Activity event encountered before a corresponding API event.
2020-07-06 13:19:27.840591: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : Activity event encountered before a corresponding API event.
2020-07-06 13:19:27.840633: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840683: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840701: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840716: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840732: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840746: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:199] Mapping physical device id 0 to logical device id 0
2020-07-06 13:19:27.840760: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840775: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840790: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840816: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840840: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840855: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840869: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840883: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840899: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
2020-07-06 13:19:27.840927: I tensorflow/core/profiler/internal/gpu/rocm_tracer.cc:768] GpuTracer stopped
2020-07-06 13:19:27.842305: I tensorflow/core/profiler/rpc/client/save_profile.cc:168] Creating directory: temp/plugins/profile/2020_07_06_13_19_27
2020-07-06 13:19:27.842913: I tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to temp/plugins/profile/2020_07_06_13_19_27/ubuntu.trace.json.gz
2020-07-06 13:19:27.842992: E tensorflow/core/profiler/utils/hardware_type_utils.cc:60] Invalid GPU compute capability.
2020-07-06 13:19:27.843468: I tensorflow/core/profiler/utils/event_span.cc:288] Generation of step-events took 0 ms

2020-07-06 13:19:27.844651: I tensorflow/python/profiler/internal/profiler_wrapper.cc:91] Creating directory: temp/plugins/profile/2020_07_06_13_19_27Dumped tool data for overview_page.pb to temp/plugins/profile/2020_07_06_13_19_27/ubuntu.overview_page.pb
Dumped tool data for input_pipeline.pb to temp/plugins/profile/2020_07_06_13_19_27/ubuntu.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to temp/plugins/profile/2020_07_06_13_19_27/ubuntu.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to temp/plugins/profile/2020_07_06_13_19_27/ubuntu.kernel_stats.pb

Profile v2 done!

Describe the expected behavior

Standalone code to reproduce the issue

alphaRGB commented 4 years ago

I also run this model using tensorflow.python.profiler.model_analyzer.Profiler with tf.compat.v1.disable_eager_execution, the error info is same: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id., I am not sure is the gpu profiler of Tensorflow or profiler of ROCm exist bugs ?

jerryyin commented 4 years ago

@deven-amd Does this sounds familiar to you? I remember you have a recent PR that fixed rocTracer, too. Wants to double check because this seems to be on ROCm3.3 context.

deven-amd commented 4 years ago

@alphaRGB

Can you try a newer TF version. We have only recently (within the last couple of months) finished implementing support for profiling on TF-ROCm and I am unsure whether the version you are using has everything (needed for profiling)

The following error will still be there and can be ignored for now

2020-07-06 13:19:27.842992: E tensorflow/core/profiler/utils/hardware_type_utils.cc:60] Invalid GPU compute capability.

That is because the TF code expects a CUDA style compute capability number in the profiling data (which obviously won't be present on the ROCm platform)

The other message, i.e.

2020-07-06 13:19:27.840899: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.

is somewhat problematic, and ideally you should not see those at all. As the message output indicates, it means that the data for some event had to be dropped because there was inconsistency found within it (invalid stream id in this case). If you get a few of these when the number of events being collected are in the thousands, it is no big deal (since amount of lost data is insignificant), but you should not be seeing any for a testcase this small

deven-amd commented 4 years ago

another sugegstion I have is to use the method outlined in the TF tutorial here https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras

to dump the profiling data. I just tried it on a simple example (using TF build from source on develop-upstream branch) and it works

alphaRGB commented 4 years ago

@deven-amd thank you, I will test the TF tutorial demo first. Which commit should I use to compile TF, the lastest or there is a specified commit id on develop-upstream branch? And which ROCm version you tried?

alphaRGB commented 4 years ago

I have test the demo of "tensorboard_profiling_keras" in the url with TF 2.1 (merge-200413-0-g129dd9a34e 2.1.0) + ROCm3.3.0.
Due to the network reason, sorry I can't provide you detaild pictures of Tensorboard. In the PROFILER page of tensorboard. Performance Summary are all zeros on AMD GPU, it is seems error. while I run same code on NVIDIA, most values are not zeros: Performance Summary (NVIDIA)

Average Step Time: 7.5 ms
All Others Time: 1.0 ms
Compliation Time: 0.0 ms
Output Time: 0.0 ms
Input Time 0.0 ms
Kernel Launch Time: 0.9 ms
Host Compte Time: 0.1 ms
Decie to Device Time: 0.0 ms
Device Compute Time: 5.4 ms

There seems that the profier of Tensorflow-rocm I used may exists problem, I'll try a new TF with newer ROCm.

jerryyin commented 4 years ago

@alphaRGB TF2.1 is probably still too old. Could you maybe try the following two docker images? Or if you'd rather build by youself, use the commit id of tensorflow follows rocm version.

For ROCm 3.5: docker pull rocm/tensorflow-autobuilds:rocm3.5-760bec0

For ROCm 3.3: docker pull rocm/tensorflow-autobuilds:rocm3.3-9ca344d

alphaRGB commented 4 years ago

@jerryyin Thank for your advice and provided TF. but we can't use docker, so I tried to build Tensorflow (commit id = 760bec0) with ROCm==3.5.0, but complied failed. Could you help me check the TF compile errors? It will be great if you share a prebuild TF(760bec0) whl package to us if you have build it with success.

ROCm

I installed ROCm using these cmds, then set LD_LIBRARY_PATH=/opt/rocm-3.5.0/libs PATH=/opt/rocm-3.5.0/bin, the rocm-smi , rocminfo and clinfo works ok.

sudo apt-get install rocm-dkms3.5.0
sudo apt install rocm-libs3.5.0 miopen-hip3.5.0 rccl3.5.0
sudo apt-get install miopengemm3.5.0

Build Tensorflow

bazel==3.1.0
python==3.6.9
ROCm==3.5.0

TF build cmd I have download third-parts before, then set distdir while compile

ROCM_PATH=/opt/rocm-3.5.0 TF_NEED_ROCM=1 PYTHON_BIN_PATH=/usr/bin/python3 ./configure
bazel build --distdir /data/tf_thirdpart_downloads/downloads/ --config=opt --config=rocm //tensorflow/tools/pip_package:build_pip_package --verbose_failures

Errors

ERROR: /home/fimhbm/penghui_wei/Tensorflow/tf-amd/tensorflow-upstream-760bec08ba01c374b44015493b975c6d52beb324/tensorflow/core/kernels/rnn/BUILD:54:1: C++ compilation of rule '//tensorflow/core/kernels/rnn:gru_ops_gpu' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (cd /home/fimhbm/.cache/bazel/_bazel_fimhbm/aff401f30ba583f7f12006ac9f35b87c/execroot/org_tensorflow && \ exec env - \ LD_LIBRARY_PATH=/opt/rocm/lib: \ external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF bazel
clang-11: warning: argument unused during compilation: '--hip-device-lib-path=/opt/rocm-3.5.0/lib' [-Wunused-command-line-argument] lld: error: undefined symbol: ldexp(float, int)

The error "'//tensorflow/core/kernels/rnn:gru_ops_gpu' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command" seem samilar to [#1036 ].

The error "lld: error: undefined symbol: ldexp(float, int) " and "clang-11: warning: argument unused during compilation” seems casused by clang, I think the "clang-11" complier may not find .so or .a files, so I set LD_LIBRARY_PATH=/opt/rocm-3.5.0/llvm, it not works, same error.

Also, I think the error may be casused by compliler with FLASG=std=c++11, so I have tried: --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" with bazel build, but the error is same.

error log

INFO: From Compiling tensorflow/core/kernels/data/experimental/dense_to_sparse_batch_dataset_op.cc [for host]:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:4:0,
                 from ./third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint:35,
                 from ./tensorflow/core/framework/numeric_types.h:24,
                 from ./tensorflow/core/framework/allocator.h:26,
                 from ./tensorflow/core/framework/tensor.h:23,
                 from ./tensorflow/core/framework/attr_value_util.h:24,
                 from ./tensorflow/core/framework/dataset.h:24,
                 from tensorflow/core/kernels/data/experimental/dense_to_sparse_batch_dataset_op.cc:15:
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:30:41: warning: ignoring attributes on template argument '__m256i {aka __vector(4) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m256i, 20> Packet32q8i;
                                         ^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:31:41: warning: ignoring attributes on template argument '__m256i {aka __vector(4) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m256i, 21> Packet16q16i;
                                         ^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:32:41: warning: ignoring attributes on template argument '__m256i {aka __vector(4) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m256i, 22> Packet32q8u;
                                         ^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:33:41: warning: ignoring attributes on template argument '__m128i {aka __vector(2) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m128i, 23> Packet16q8i;
                                         ^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:34:41: warning: ignoring attributes on template argument '__m128i {aka __vector(2) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m128i, 25> Packet16q8u;
                                         ^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:35:41: warning: ignoring attributes on template argument '__m128i {aka __vector(2) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m128i, 26> Packet8q16i;
                                         ^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:36:41: warning: ignoring attributes on template argument '__m256i {aka __vector(4) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m256i, 27> Packet8q32i;
                                         ^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:37:41: warning: ignoring attributes on template argument '__m128i {aka __vector(2) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m128i, 28> Packet4q32i;
                                         ^
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint:35:0,
                 from ./tensorflow/core/framework/numeric_types.h:24,
                 from ./tensorflow/core/framework/allocator.h:26,
                 from ./tensorflow/core/framework/tensor.h:23,
                 from ./tensorflow/core/framework/attr_value_util.h:24,
                 from ./tensorflow/core/framework/dataset.h:24,
                 from tensorflow/core/kernels/data/experimental/dense_to_sparse_batch_dataset_op.cc:15:
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:9:41: warning: ignoring attributes on template argument '__m512i {aka __vector(8) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m512i, 30> Packet64q8i;
                                         ^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:10:41: warning: ignoring attributes on template argument '__m512i {aka __vector(8) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m512i, 31> Packet32q16i;
                                         ^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:11:41: warning: ignoring attributes on template argument '__m512i {aka __vector(8) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m512i, 32> Packet64q8u;
                                         ^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:12:41: warning: ignoring attributes on template argument '__m512i {aka __vector(8) long long int}' [-Wignored-attributes]
 typedef eigen_packet_wrapper<__m512i, 33> Packet16q32i;
                                         ^
ERROR: /home/fimhbm/penghui_wei/Tensorflow/tf-amd/tensorflow-upstream-760bec08ba01c374b44015493b975c6d52beb324/tensorflow/core/kernels/rnn/BUILD:54:1: C++ compilation of rule '//tensorflow/core/kernels/rnn:gru_ops_gpu' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/fimhbm/.cache/bazel/_bazel_fimhbm/aff401f30ba583f7f12006ac9f35b87c/execroot/org_tensorflow && \
  exec env - \
    LD_LIBRARY_PATH=/opt/rocm/lib: \
    PATH=/usr/local/bin:/home/fimhbm/.vscode-server/bin/cd9ea6488829f560dc949a8b2fb789f3cdc05f5d/bin:/home/fimhbm/.local/bin:/home/fimhbm/bin:/usr/local/bin:/home/WPH/Softwares/anaconda3/condabin:/home/fimhbm/.vscode-server/bin/cd9ea6488829f560dc949a8b2fb789f3cdc05f5d/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin:/opt/rocm/opencl/bin \
    PWD=/proc/self/cwd \
  external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.d '-frandom-seed=bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.o' -fPIC -DTENSORFLOW_USE_CUSTOM_CONTRACTION_KERNEL -DTENSORFLOW_USE_MKLDNN_CONTRACTION_KERNEL -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0' -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/host/bin -iquote external/com_google_absl -iquote bazel-out/host/bin/external/com_google_absl -iquote external/eigen_archive -iquote bazel-out/host/bin/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/host/bin/external/local_config_sycl -iquote external/nsync -iquote bazel-out/host/bin/external/nsync -iquote external/gif -iquote bazel-out/host/bin/external/gif -iquote external/libjpeg_turbo -iquote bazel-out/host/bin/external/libjpeg_turbo -iquote external/com_google_protobuf -iquote bazel-out/host/bin/external/com_google_protobuf -iquote external/com_googlesource_code_re2 -iquote bazel-out/host/bin/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/host/bin/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/host/bin/external/fft2d -iquote external/highwayhash -iquote bazel-out/host/bin/external/highwayhash -iquote external/zlib -iquote bazel-out/host/bin/external/zlib -iquote external/local_config_rocm -iquote bazel-out/host/bin/external/local_config_rocm -iquote external/local_config_cuda -iquote bazel-out/host/bin/external/local_config_cuda -iquote external/local_config_tensorrt -iquote bazel-out/host/bin/external/local_config_tensorrt -iquote external/mkl_dnn -iquote bazel-out/host/bin/external/mkl_dnn -Ibazel-out/host/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -Ibazel-out/host/bin/external/local_config_tensorrt/_virtual_includes/tensorrt_headers -isystem external/eigen_archive -isystem bazel-out/host/bin/external/eigen_archive -isystem external/nsync/public -isystem bazel-out/host/bin/external/nsync/public -isystem external/gif -isystem bazel-out/host/bin/external/gif -isystem external/com_google_protobuf/src -isystem bazel-out/host/bin/external/com_google_protobuf/src -isystem external/farmhash_archive/src -isystem bazel-out/host/bin/external/farmhash_archive/src -isystem external/zlib -isystem bazel-out/host/bin/external/zlib -isystem external/local_config_rocm/rocm -isystem bazel-out/host/bin/external/local_config_rocm/rocm -isystem external/local_config_rocm/rocm/rocm/include -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include -isystem external/local_config_rocm/rocm/rocm/include/rocrand -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include/rocrand -isystem external/local_config_rocm/rocm/rocm/include/roctracer -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include/roctracer -isystem external/local_config_cuda/cuda -isystem bazel-out/host/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/host/bin/external/local_config_cuda/cuda/cuda/include -isystem external/mkl_dnn/include -isystem bazel-out/host/bin/external/mkl_dnn/include -isystem external/mkl_dnn/src -isystem bazel-out/host/bin/external/mkl_dnn/src -isystem external/mkl_dnn/src/common -isystem bazel-out/host/bin/external/mkl_dnn/src/common -isystem external/mkl_dnn/src/cpu -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu -isystem external/mkl_dnn/src/cpu/gemm -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu/gemm -isystem external/mkl_dnn/src/cpu/xbyak -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu/xbyak -g0 '-march=native' -g0 '-std=c++14' -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -fno-exceptions '-DTENSORFLOW_USE_XLA=1' '-DTENSORFLOW_USE_ROCM=1' -msse3 -pthread -x rocm -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-DTENSORFLOW_USE_ROCM=1' -D__HIP_PLATFORM_HCC__ -DEIGEN_USE_HIP '-DTENSORFLOW_COMPILER_IS_HIP_CLANG=1' -no-canonical-prefixes -fno-canonical-system-headers -c tensorflow/core/kernels/rnn/gru_ops_gpu.cu.cc -o bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.o)
Execution platform: @local_execution_config_platform//:platform
clang-11: warning: argument unused during compilation: '--hip-device-lib-path=/opt/rocm-3.5.0/lib' [-Wunused-command-line-argument]
lld: error: undefined symbol: ldexp(float, int)
>>> referenced by /tmp/gru_ops_gpu-919f0d-gfx906-9f8e5c.o:(void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_logistic_op<float>, Eigen::TensorSlicingOp<Eigen::array<long, 2ul> const, Eigen::array<long, 2ul> const, Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer> > const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_logistic_op<float>, Eigen::TensorSlicingOp<Eigen::array<long, 2ul> const, Eigen::array<long, 2ul> const, Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer> > const> const> const, Eigen::GpuDevice>, long))
>>> referenced by /tmp/gru_ops_gpu-919f0d-gfx906-9f8e5c.o:(void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_logistic_op<float>, Eigen::TensorSlicingOp<Eigen::array<long, 2ul> const, Eigen::array<long, 2ul> const, Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer> > const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_logistic_op<float>, Eigen::TensorSlicingOp<Eigen::array<long, 2ul> const, Eigen::array<long, 2ul> const, Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer> > const> const> const, Eigen::GpuDevice>, long))
clang-11: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
Target //tensorflow/tools/pip_package:build_pip_package failed to build
ERROR: /home/fimhbm/penghui_wei/Tensorflow/tf-amd/tensorflow-upstream-760bec08ba01c374b44015493b975c6d52beb324/tensorflow/tools/pip_package/BUILD:66:1 C++ compilation of rule '//tensorflow/core/kernels/rnn:gru_ops_gpu' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/fimhbm/.cache/bazel/_bazel_fimhbm/aff401f30ba583f7f12006ac9f35b87c/execroot/org_tensorflow && \
  exec env - \
    LD_LIBRARY_PATH=/opt/rocm/lib: \
    PATH=/usr/local/bin:/home/fimhbm/.vscode-server/bin/cd9ea6488829f560dc949a8b2fb789f3cdc05f5d/bin:/home/fimhbm/.local/bin:/home/fimhbm/bin:/usr/local/bin:/home/WPH/Softwares/anaconda3/condabin:/home/fimhbm/.vscode-server/bin/cd9ea6488829f560dc949a8b2fb789f3cdc05f5d/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin:/opt/rocm/opencl/bin \
    PWD=/proc/self/cwd \
  external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.d '-frandom-seed=bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.o' -fPIC -DTENSORFLOW_USE_CUSTOM_CONTRACTION_KERNEL -DTENSORFLOW_USE_MKLDNN_CONTRACTION_KERNEL -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0' -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/host/bin -iquote external/com_google_absl -iquote bazel-out/host/bin/external/com_google_absl -iquote external/eigen_archive -iquote bazel-out/host/bin/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/host/bin/external/local_config_sycl -iquote external/nsync -iquote bazel-out/host/bin/external/nsync -iquote external/gif -iquote bazel-out/host/bin/external/gif -iquote external/libjpeg_turbo -iquote bazel-out/host/bin/external/libjpeg_turbo -iquote external/com_google_protobuf -iquote bazel-out/host/bin/external/com_google_protobuf -iquote external/com_googlesource_code_re2 -iquote bazel-out/host/bin/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/host/bin/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/host/bin/external/fft2d -iquote external/highwayhash -iquote bazel-out/host/bin/external/highwayhash -iquote external/zlib -iquote bazel-out/host/bin/external/zlib -iquote external/local_config_rocm -iquote bazel-out/host/bin/external/local_config_rocm -iquote external/local_config_cuda -iquote bazel-out/host/bin/external/local_config_cuda -iquote external/local_config_tensorrt -iquote bazel-out/host/bin/external/local_config_tensorrt -iquote external/mkl_dnn -iquote bazel-out/host/bin/external/mkl_dnn -Ibazel-out/host/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -Ibazel-out/host/bin/external/local_config_tensorrt/_virtual_includes/tensorrt_headers -isystem external/eigen_archive -isystem bazel-out/host/bin/external/eigen_archive -isystem external/nsync/public -isystem bazel-out/host/bin/external/nsync/public -isystem external/gif -isystem bazel-out/host/bin/external/gif -isystem external/com_google_protobuf/src -isystem bazel-out/host/bin/external/com_google_protobuf/src -isystem external/farmhash_archive/src -isystem bazel-out/host/bin/external/farmhash_archive/src -isystem external/zlib -isystem bazel-out/host/bin/external/zlib -isystem external/local_config_rocm/rocm -isystem bazel-out/host/bin/external/local_config_rocm/rocm -isystem external/local_config_rocm/rocm/rocm/include -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include -isystem external/local_config_rocm/rocm/rocm/include/rocrand -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include/rocrand -isystem external/local_config_rocm/rocm/rocm/include/roctracer -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include/roctracer -isystem external/local_config_cuda/cuda -isystem bazel-out/host/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/host/bin/external/local_config_cuda/cuda/cuda/include -isystem external/mkl_dnn/include -isystem bazel-out/host/bin/external/mkl_dnn/include -isystem external/mkl_dnn/src -isystem bazel-out/host/bin/external/mkl_dnn/src -isystem external/mkl_dnn/src/common -isystem bazel-out/host/bin/external/mkl_dnn/src/common -isystem external/mkl_dnn/src/cpu -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu -isystem external/mkl_dnn/src/cpu/gemm -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu/gemm -isystem external/mkl_dnn/src/cpu/xbyak -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu/xbyak -g0 '-march=native' -g0 '-std=c++14' -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -fno-exceptions '-DTENSORFLOW_USE_XLA=1' '-DTENSORFLOW_USE_ROCM=1' -msse3 -pthread -x rocm -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-DTENSORFLOW_USE_ROCM=1' -D__HIP_PLATFORM_HCC__ -DEIGEN_USE_HIP '-DTENSORFLOW_COMPILER_IS_HIP_CLANG=1' -no-canonical-prefixes -fno-canonical-system-headers -c tensorflow/core/kernels/rnn/gru_ops_gpu.cu.cc -o bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.o)
Execution platform: @local_execution_config_platform//:platform
INFO: Elapsed time: 1188.356s, Critical Path: 132.95s
INFO: 11806 processes: 11806 local.
FAILED: Build did NOT complete successfully

deven-amd commented 4 years ago

you have run into the first of two known build errors with ROCm 3.5

both can be workedaround as shown here https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/tensorflow/tools/ci_build/Dockerfile.rocm#L121-L126

ROCm / tensorflow-upstream