Closed alphaRGB closed 4 years ago
I also run this model using tensorflow.python.profiler.model_analyzer.Profiler
with tf.compat.v1.disable_eager_execution
, the error info is same: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id., I am not sure is the gpu profiler of Tensorflow or profiler of ROCm exist bugs ?
@deven-amd Does this sounds familiar to you? I remember you have a recent PR that fixed rocTracer, too. Wants to double check because this seems to be on ROCm3.3 context.
@alphaRGB
Can you try a newer TF version. We have only recently (within the last couple of months) finished implementing support for profiling on TF-ROCm and I am unsure whether the version you are using has everything (needed for profiling)
The following error will still be there and can be ignored for now
2020-07-06 13:19:27.842992: E tensorflow/core/profiler/utils/hardware_type_utils.cc:60] Invalid GPU compute capability.
That is because the TF code expects a CUDA style compute capability number in the profiling data (which obviously won't be present on the ROCm platform)
The other message, i.e.
2020-07-06 13:19:27.840899: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id.
is somewhat problematic, and ideally you should not see those at all. As the message output indicates, it means that the data for some event had to be dropped because there was inconsistency found within it (invalid stream id in this case). If you get a few of these when the number of events being collected are in the thousands, it is no big deal (since amount of lost data is insignificant), but you should not be seeing any for a testcase this small
another sugegstion I have is to use the method outlined in the TF tutorial here https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras
to dump the profiling data. I just tried it on a simple example (using TF build from source on develop-upstream
branch) and it works
@deven-amd thank you, I will test the TF tutorial demo first. Which commit should I use to compile TF, the lastest or there is a specified commit id on develop-upstream branch? And which ROCm version you tried?
I have test the demo of "tensorboard_profiling_keras" in the url with TF 2.1 (merge-200413-0-g129dd9a34e 2.1.0) + ROCm3.3.0.
Due to the network reason, sorry I can't provide you detaild pictures of Tensorboard. In the PROFILER page of tensorboard.
Performance Summary are all zeros on AMD GPU, it is seems error. while I run same code on NVIDIA, most values are not zeros:
Performance Summary (NVIDIA)
There seems that the profier of Tensorflow-rocm I used may exists problem, I'll try a new TF with newer ROCm.
@alphaRGB TF2.1 is probably still too old. Could you maybe try the following two docker images? Or if you'd rather build by youself, use the commit id of tensorflow follows rocm version.
For ROCm 3.5:
docker pull rocm/tensorflow-autobuilds:rocm3.5-760bec0
For ROCm 3.3:
docker pull rocm/tensorflow-autobuilds:rocm3.3-9ca344d
@jerryyin Thank for your advice and provided TF. but we can't use docker, so I tried to build Tensorflow (commit id = 760bec0) with ROCm==3.5.0, but complied failed. Could you help me check the TF compile errors? It will be great if you share a prebuild TF(760bec0) whl
package to us if you have build it with success.
I installed ROCm using these cmds, then set LD_LIBRARY_PATH=/opt/rocm-3.5.0/libs
PATH=/opt/rocm-3.5.0/bin
, the rocm-smi
, rocminfo
and clinfo
works ok.
sudo apt-get install rocm-dkms3.5.0
sudo apt install rocm-libs3.5.0 miopen-hip3.5.0 rccl3.5.0
sudo apt-get install miopengemm3.5.0
TF build cmd
I have download third-parts before, then set distdir
while compile
ROCM_PATH=/opt/rocm-3.5.0 TF_NEED_ROCM=1 PYTHON_BIN_PATH=/usr/bin/python3 ./configure
bazel build --distdir /data/tf_thirdpart_downloads/downloads/ --config=opt --config=rocm //tensorflow/tools/pip_package:build_pip_package --verbose_failures
ERROR: /home/fimhbm/penghui_wei/Tensorflow/tf-amd/tensorflow-upstream-760bec08ba01c374b44015493b975c6d52beb324/tensorflow/core/kernels/rnn/BUILD:54:1: C++ compilation of rule '//tensorflow/core/kernels/rnn:gru_ops_gpu' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (cd /home/fimhbm/.cache/bazel/_bazel_fimhbm/aff401f30ba583f7f12006ac9f35b87c/execroot/org_tensorflow && \ exec env - \ LD_LIBRARY_PATH=/opt/rocm/lib: \ external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF bazel
clang-11: warning: argument unused during compilation: '--hip-device-lib-path=/opt/rocm-3.5.0/lib' [-Wunused-command-line-argument] lld: error: undefined symbol: ldexp(float, int)
The error "'//tensorflow/core/kernels/rnn:gru_ops_gpu' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command" seem samilar to [#1036 ].
The error "lld: error: undefined symbol: ldexp(float, int) " and "clang-11: warning: argument unused during compilation” seems casused by clang, I think the "clang-11" complier may not find .so or .a files, so I set LD_LIBRARY_PATH=/opt/rocm-3.5.0/llvm
, it not works, same error.
Also, I think the error may be casused by compliler with FLASG=std=c++11, so I have tried: --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
with bazel build
, but the error is same.
error log
INFO: From Compiling tensorflow/core/kernels/data/experimental/dense_to_sparse_batch_dataset_op.cc [for host]:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:4:0,
from ./third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint:35,
from ./tensorflow/core/framework/numeric_types.h:24,
from ./tensorflow/core/framework/allocator.h:26,
from ./tensorflow/core/framework/tensor.h:23,
from ./tensorflow/core/framework/attr_value_util.h:24,
from ./tensorflow/core/framework/dataset.h:24,
from tensorflow/core/kernels/data/experimental/dense_to_sparse_batch_dataset_op.cc:15:
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:30:41: warning: ignoring attributes on template argument '__m256i {aka __vector(4) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m256i, 20> Packet32q8i;
^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:31:41: warning: ignoring attributes on template argument '__m256i {aka __vector(4) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m256i, 21> Packet16q16i;
^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:32:41: warning: ignoring attributes on template argument '__m256i {aka __vector(4) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m256i, 22> Packet32q8u;
^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:33:41: warning: ignoring attributes on template argument '__m128i {aka __vector(2) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m128i, 23> Packet16q8i;
^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:34:41: warning: ignoring attributes on template argument '__m128i {aka __vector(2) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m128i, 25> Packet16q8u;
^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:35:41: warning: ignoring attributes on template argument '__m128i {aka __vector(2) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m128i, 26> Packet8q16i;
^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:36:41: warning: ignoring attributes on template argument '__m256i {aka __vector(4) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m256i, 27> Packet8q32i;
^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h:37:41: warning: ignoring attributes on template argument '__m128i {aka __vector(2) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m128i, 28> Packet4q32i;
^
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint:35:0,
from ./tensorflow/core/framework/numeric_types.h:24,
from ./tensorflow/core/framework/allocator.h:26,
from ./tensorflow/core/framework/tensor.h:23,
from ./tensorflow/core/framework/attr_value_util.h:24,
from ./tensorflow/core/framework/dataset.h:24,
from tensorflow/core/kernels/data/experimental/dense_to_sparse_batch_dataset_op.cc:15:
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:9:41: warning: ignoring attributes on template argument '__m512i {aka __vector(8) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m512i, 30> Packet64q8i;
^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:10:41: warning: ignoring attributes on template argument '__m512i {aka __vector(8) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m512i, 31> Packet32q16i;
^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:11:41: warning: ignoring attributes on template argument '__m512i {aka __vector(8) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m512i, 32> Packet64q8u;
^
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:12:41: warning: ignoring attributes on template argument '__m512i {aka __vector(8) long long int}' [-Wignored-attributes]
typedef eigen_packet_wrapper<__m512i, 33> Packet16q32i;
^
ERROR: /home/fimhbm/penghui_wei/Tensorflow/tf-amd/tensorflow-upstream-760bec08ba01c374b44015493b975c6d52beb324/tensorflow/core/kernels/rnn/BUILD:54:1: C++ compilation of rule '//tensorflow/core/kernels/rnn:gru_ops_gpu' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command
(cd /home/fimhbm/.cache/bazel/_bazel_fimhbm/aff401f30ba583f7f12006ac9f35b87c/execroot/org_tensorflow && \
exec env - \
LD_LIBRARY_PATH=/opt/rocm/lib: \
PATH=/usr/local/bin:/home/fimhbm/.vscode-server/bin/cd9ea6488829f560dc949a8b2fb789f3cdc05f5d/bin:/home/fimhbm/.local/bin:/home/fimhbm/bin:/usr/local/bin:/home/WPH/Softwares/anaconda3/condabin:/home/fimhbm/.vscode-server/bin/cd9ea6488829f560dc949a8b2fb789f3cdc05f5d/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin:/opt/rocm/opencl/bin \
PWD=/proc/self/cwd \
external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.d '-frandom-seed=bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.o' -fPIC -DTENSORFLOW_USE_CUSTOM_CONTRACTION_KERNEL -DTENSORFLOW_USE_MKLDNN_CONTRACTION_KERNEL -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0' -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/host/bin -iquote external/com_google_absl -iquote bazel-out/host/bin/external/com_google_absl -iquote external/eigen_archive -iquote bazel-out/host/bin/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/host/bin/external/local_config_sycl -iquote external/nsync -iquote bazel-out/host/bin/external/nsync -iquote external/gif -iquote bazel-out/host/bin/external/gif -iquote external/libjpeg_turbo -iquote bazel-out/host/bin/external/libjpeg_turbo -iquote external/com_google_protobuf -iquote bazel-out/host/bin/external/com_google_protobuf -iquote external/com_googlesource_code_re2 -iquote bazel-out/host/bin/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/host/bin/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/host/bin/external/fft2d -iquote external/highwayhash -iquote bazel-out/host/bin/external/highwayhash -iquote external/zlib -iquote bazel-out/host/bin/external/zlib -iquote external/local_config_rocm -iquote bazel-out/host/bin/external/local_config_rocm -iquote external/local_config_cuda -iquote bazel-out/host/bin/external/local_config_cuda -iquote external/local_config_tensorrt -iquote bazel-out/host/bin/external/local_config_tensorrt -iquote external/mkl_dnn -iquote bazel-out/host/bin/external/mkl_dnn -Ibazel-out/host/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -Ibazel-out/host/bin/external/local_config_tensorrt/_virtual_includes/tensorrt_headers -isystem external/eigen_archive -isystem bazel-out/host/bin/external/eigen_archive -isystem external/nsync/public -isystem bazel-out/host/bin/external/nsync/public -isystem external/gif -isystem bazel-out/host/bin/external/gif -isystem external/com_google_protobuf/src -isystem bazel-out/host/bin/external/com_google_protobuf/src -isystem external/farmhash_archive/src -isystem bazel-out/host/bin/external/farmhash_archive/src -isystem external/zlib -isystem bazel-out/host/bin/external/zlib -isystem external/local_config_rocm/rocm -isystem bazel-out/host/bin/external/local_config_rocm/rocm -isystem external/local_config_rocm/rocm/rocm/include -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include -isystem external/local_config_rocm/rocm/rocm/include/rocrand -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include/rocrand -isystem external/local_config_rocm/rocm/rocm/include/roctracer -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include/roctracer -isystem external/local_config_cuda/cuda -isystem bazel-out/host/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/host/bin/external/local_config_cuda/cuda/cuda/include -isystem external/mkl_dnn/include -isystem bazel-out/host/bin/external/mkl_dnn/include -isystem external/mkl_dnn/src -isystem bazel-out/host/bin/external/mkl_dnn/src -isystem external/mkl_dnn/src/common -isystem bazel-out/host/bin/external/mkl_dnn/src/common -isystem external/mkl_dnn/src/cpu -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu -isystem external/mkl_dnn/src/cpu/gemm -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu/gemm -isystem external/mkl_dnn/src/cpu/xbyak -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu/xbyak -g0 '-march=native' -g0 '-std=c++14' -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -fno-exceptions '-DTENSORFLOW_USE_XLA=1' '-DTENSORFLOW_USE_ROCM=1' -msse3 -pthread -x rocm -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-DTENSORFLOW_USE_ROCM=1' -D__HIP_PLATFORM_HCC__ -DEIGEN_USE_HIP '-DTENSORFLOW_COMPILER_IS_HIP_CLANG=1' -no-canonical-prefixes -fno-canonical-system-headers -c tensorflow/core/kernels/rnn/gru_ops_gpu.cu.cc -o bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.o)
Execution platform: @local_execution_config_platform//:platform
clang-11: warning: argument unused during compilation: '--hip-device-lib-path=/opt/rocm-3.5.0/lib' [-Wunused-command-line-argument]
lld: error: undefined symbol: ldexp(float, int)
>>> referenced by /tmp/gru_ops_gpu-919f0d-gfx906-9f8e5c.o:(void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_logistic_op<float>, Eigen::TensorSlicingOp<Eigen::array<long, 2ul> const, Eigen::array<long, 2ul> const, Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer> > const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_logistic_op<float>, Eigen::TensorSlicingOp<Eigen::array<long, 2ul> const, Eigen::array<long, 2ul> const, Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer> > const> const> const, Eigen::GpuDevice>, long))
>>> referenced by /tmp/gru_ops_gpu-919f0d-gfx906-9f8e5c.o:(void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_logistic_op<float>, Eigen::TensorSlicingOp<Eigen::array<long, 2ul> const, Eigen::array<long, 2ul> const, Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer> > const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_logistic_op<float>, Eigen::TensorSlicingOp<Eigen::array<long, 2ul> const, Eigen::array<long, 2ul> const, Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer> > const> const> const, Eigen::GpuDevice>, long))
clang-11: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
Target //tensorflow/tools/pip_package:build_pip_package failed to build
ERROR: /home/fimhbm/penghui_wei/Tensorflow/tf-amd/tensorflow-upstream-760bec08ba01c374b44015493b975c6d52beb324/tensorflow/tools/pip_package/BUILD:66:1 C++ compilation of rule '//tensorflow/core/kernels/rnn:gru_ops_gpu' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command
(cd /home/fimhbm/.cache/bazel/_bazel_fimhbm/aff401f30ba583f7f12006ac9f35b87c/execroot/org_tensorflow && \
exec env - \
LD_LIBRARY_PATH=/opt/rocm/lib: \
PATH=/usr/local/bin:/home/fimhbm/.vscode-server/bin/cd9ea6488829f560dc949a8b2fb789f3cdc05f5d/bin:/home/fimhbm/.local/bin:/home/fimhbm/bin:/usr/local/bin:/home/WPH/Softwares/anaconda3/condabin:/home/fimhbm/.vscode-server/bin/cd9ea6488829f560dc949a8b2fb789f3cdc05f5d/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin:/opt/rocm/opencl/bin \
PWD=/proc/self/cwd \
external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.d '-frandom-seed=bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.o' -fPIC -DTENSORFLOW_USE_CUSTOM_CONTRACTION_KERNEL -DTENSORFLOW_USE_MKLDNN_CONTRACTION_KERNEL -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0' -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/host/bin -iquote external/com_google_absl -iquote bazel-out/host/bin/external/com_google_absl -iquote external/eigen_archive -iquote bazel-out/host/bin/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/host/bin/external/local_config_sycl -iquote external/nsync -iquote bazel-out/host/bin/external/nsync -iquote external/gif -iquote bazel-out/host/bin/external/gif -iquote external/libjpeg_turbo -iquote bazel-out/host/bin/external/libjpeg_turbo -iquote external/com_google_protobuf -iquote bazel-out/host/bin/external/com_google_protobuf -iquote external/com_googlesource_code_re2 -iquote bazel-out/host/bin/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/host/bin/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/host/bin/external/fft2d -iquote external/highwayhash -iquote bazel-out/host/bin/external/highwayhash -iquote external/zlib -iquote bazel-out/host/bin/external/zlib -iquote external/local_config_rocm -iquote bazel-out/host/bin/external/local_config_rocm -iquote external/local_config_cuda -iquote bazel-out/host/bin/external/local_config_cuda -iquote external/local_config_tensorrt -iquote bazel-out/host/bin/external/local_config_tensorrt -iquote external/mkl_dnn -iquote bazel-out/host/bin/external/mkl_dnn -Ibazel-out/host/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -Ibazel-out/host/bin/external/local_config_tensorrt/_virtual_includes/tensorrt_headers -isystem external/eigen_archive -isystem bazel-out/host/bin/external/eigen_archive -isystem external/nsync/public -isystem bazel-out/host/bin/external/nsync/public -isystem external/gif -isystem bazel-out/host/bin/external/gif -isystem external/com_google_protobuf/src -isystem bazel-out/host/bin/external/com_google_protobuf/src -isystem external/farmhash_archive/src -isystem bazel-out/host/bin/external/farmhash_archive/src -isystem external/zlib -isystem bazel-out/host/bin/external/zlib -isystem external/local_config_rocm/rocm -isystem bazel-out/host/bin/external/local_config_rocm/rocm -isystem external/local_config_rocm/rocm/rocm/include -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include -isystem external/local_config_rocm/rocm/rocm/include/rocrand -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include/rocrand -isystem external/local_config_rocm/rocm/rocm/include/roctracer -isystem bazel-out/host/bin/external/local_config_rocm/rocm/rocm/include/roctracer -isystem external/local_config_cuda/cuda -isystem bazel-out/host/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/host/bin/external/local_config_cuda/cuda/cuda/include -isystem external/mkl_dnn/include -isystem bazel-out/host/bin/external/mkl_dnn/include -isystem external/mkl_dnn/src -isystem bazel-out/host/bin/external/mkl_dnn/src -isystem external/mkl_dnn/src/common -isystem bazel-out/host/bin/external/mkl_dnn/src/common -isystem external/mkl_dnn/src/cpu -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu -isystem external/mkl_dnn/src/cpu/gemm -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu/gemm -isystem external/mkl_dnn/src/cpu/xbyak -isystem bazel-out/host/bin/external/mkl_dnn/src/cpu/xbyak -g0 '-march=native' -g0 '-std=c++14' -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -fno-exceptions '-DTENSORFLOW_USE_XLA=1' '-DTENSORFLOW_USE_ROCM=1' -msse3 -pthread -x rocm -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-DTENSORFLOW_USE_ROCM=1' -D__HIP_PLATFORM_HCC__ -DEIGEN_USE_HIP '-DTENSORFLOW_COMPILER_IS_HIP_CLANG=1' -no-canonical-prefixes -fno-canonical-system-headers -c tensorflow/core/kernels/rnn/gru_ops_gpu.cu.cc -o bazel-out/host/bin/tensorflow/core/kernels/rnn/_objs/gru_ops_gpu/gru_ops_gpu.cu.pic.o)
Execution platform: @local_execution_config_platform//:platform
INFO: Elapsed time: 1188.356s, Critical Path: 132.95s
INFO: 11806 processes: 11806 local.
FAILED: Build did NOT complete successfully
you have run into the first of two known build errors with ROCm 3.5
both can be workedaround as shown here https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/tensorflow/tools/ci_build/Dockerfile.rocm#L121-L126
System information
Describe the current behavior I want to profile a CNN model on AMD GPU, my model is implement by
tf.keras
API, but get error profiler output: Here is my profiler test code: (This works ok and profile correct result using NVIDIA GPU)After execute above code, Tensorflow print many errors: I tensorflow/core/profiler/internal/gpu/device_tracer_rocm.cc:177] RocmTracerEvent(s) dropped (1) : invalid stream id., It seems that many Op failed to trace. So I want to know how to profile model on AMD GPU with tensorflow>=2.1.0 ? many thank.
tf outputs:
Describe the expected behavior
Standalone code to reproduce the issue