Closed camierjs closed 4 years ago
Hi Jean-Sylvain. I think what you are looking for is the rocprofiler. Easy mistake ;)
Hi Noah,
You're right, I was first trying to use the rocprof
shipped with the rocm/2.10
software toolchain.
I switched to the rocprofiler
github version, compiled it and run it with the --hip-trace
option.
Here is the output:
corona_hip/mfem4_bps> ~/usr/local/bin/rocprof --hip-trace ./bp3 -o 2 -l 8 -d hip
RPL: on '191218_154552' from '/g/g91/camier1/usr/local/rocprofiler' in '~/home/benchmarks_corona/builds/corona_hip/mfem4_bps'
RPL: profiling '"./bp3" "-o" "2" "-l" "8" "-d" "hip"'
RPL: input file ''
RPL: output dir '/tmp/rpl_data_191218_154552_114455'
RPL: result dir '/tmp/rpl_data_191218_154552_114455/input_results_191218_154552'
Tool lib "~/usr/local/roctracer/tool/libtracer_tool.so" failed to load.
Options used:
--mesh-dimension 3
--refinement-level 8
--order 2
--device hip
Device configuration: hip,cpu
Processor partitioning: 1 1 1
Mesh dimensions: 8 8 4
Total number of elements: 256
Number of finite element unknowns: 2601
Iteration : 0 (B r, r) = 0.000604889 ...
Iteration : 56 (B r, r) = 3.78127e-28
Average reduction factor = 0.607985
Total CG time: 0.0621552 (0.0621552) sec.
Time per CG step: 0.00110991 (0.00110991) sec.
"DOFs/sec" in CG: 2.34342 (2.34342) million.
One results.db
file is outputed, but the message libtracer_tool
suggests me to compile the roctracer tool, but fail with the master
branch with the following error:
In file included from /g/g91/camier1/home/roctracer/src/core/roctracer.cpp:30:
/g/g91/camier1/home/roctracer/inc/roctracer_kfd.h:30:10: fatal error: inc/kfd_ostream_ops.h: No such file or directory
#include "inc/kfd_ostream_ops.h"
^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
I'd like to have the json
file as I see it is possible to get one.
Thank you,
Jean-Sylvain
Ahh yes, your system is missing roctracer. Do you have sudo access on the system? If so, then the easiest method is to sudo apt install roctracer-dev
.
Otherwise, if you don't have sudo access and need to build from source, perhaps try building the rocm-2.10.x
branch of roctracer since you're using ROCm 2.10. The master branch has already pulled in many changes for ROCm 3.0. The software stack is moving fast.
Ok, thank you for your answer.
I don't have sudo
access, I'll try to rebuild it.
Hi,
I've rebuild both roc-2.10.x
branches of rocprofiler
and roctracer
, but I'm still hitting the same error:
Tool lib "/g/g91/camier1/usr/local/rocprofiler/roctracer/tool/libtracer_tool.so" failed to load.
ldd
looks good, LD_LIBRARY_PATH
too.
My command line looks like this:
~/usr/local/rocprofiler/bin/rocprof -i counterfile_HSA_Vega.txt --stats --hip-trace -t tmp -d data ./bp1 -o 2 -l 8 -d hip
Thank you for any suggestion,
Jean-Sylvain
Hi Camierjs, could you share what system and which compiler you use?
I'm on the Corona cluster, with the 2.10 ROCm
stack installed on the system.
I tried on MI25
and MI60
.
@camierjs:
1) there was a bug in the library paths of roctracer
for RHEL + ROCm 2.10, this can be fixed via setting export LD_LIBRARY_PATH=$(LD_LIBRARY_PATH):/opt/rocm/roctracer/lib
in your environment. This may resolve your tool lib error.
2) Regardless, I believe that the hsa-amd-aqlprofile.x86_64
package must be installed via sudo yum install hsa-amd-aqlprofile.x86_64
, as this library isn't open-sourced yet. On my system this lives in /opt/rocm/hsa-amd-aqlprofile/lib/libhsa-amd-aqlprofile64.so.1.0.0
.
ROCr runtime failed to dlopen '/g/g91/camier1/usr/local/rocprofiler/roctracer/tool/libtracer_tool.so' library.
If according to the message above "ldd looks good" all dependencies were resolved then it might be symbols linking problem.
@camierjs: Could you try a simple test, for example from: '/opt/rocm/hip/samples/2_Cookbook/0_MatrixTranspose', and to check using the LD_DEBUG environment variable if some symbols were not found by dynamic linker: $ rocprof --cmd-qts off LD_DEBUG=all ./MatrixTranspose
or just to debug symbols: $ rocprof --cmd-qts off LD_DEBUG=symbols ./MatrixTranspose
A link for LD_DEBUG description: http://www.bnikolic.co.uk/blog/linux-ld-debug.html
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/roctracer/lib
didn't helphsa-amd-aqlprofile.x86_64
package seems to be installed:/opt/rocm/hsa-amd-aqlprofile/lib> ls
lrwxrwxrwx 1 root root 28 Mar 25 2019 libhsa-amd-aqlprofile64.so -> libhsa-amd-aqlprofile64.so.1
lrwxrwxrwx 1 root root 32 Mar 25 2019 libhsa-amd-aqlprofile64.so.1 -> libhsa-amd-aqlprofile64.so.1.0.0
-rwxr-xr-x 1 root root 220064 May 6 2018 libhsa-amd-aqlprofile64.so.1.0.0
rocprof
(system and the recompiled 2.10.x) don't have the --cmd-qts
option. However, here ares the outputs of the four runs:
LD_DEBUG=all ./MatrixTranspose
: MatrixTranspose.LD_DEBUG.all.gzLD_DEBUG=symbols ./MatrixTranspose
: MatrixTranspose.LD_DEBUG.symbols.gzLD_DEBUG=symbols ~/usr/local/rocprofiler/bin/rocprof --stats --hip-trace ./MatrixTranspose
:
rocprof.MatrixTranspose.LD_DEBUG.all.gzLD_DEBUG=all ~/usr/local/rocprofiler/bin/rocprof --stats --hip-trace ./MatrixTranspose
:
rocprof.MatrixTranspose.LD_DEBUG.symbols.gzWe do see the error with the rocprof
:
102080: /lib64/libstdc++.so.6: error: version lookup error: version `GLIBCXX_3.4.20' not found (required by /g/g91/camier1/usr/local/rocprofiler/roctracer/tool/libtracer_tool.so) (fatal)
102080: file=/g/g91/camier1/usr/local/rocprofiler/roctracer/tool/libtracer_tool.so [0]; destroying link map
Which OS do you have and which compiler do you use?
Linux corona141 3.10.0-1062.7.1.1chaos.ch6.x86_64
, and the module list is:
Currently Loaded Modules:
1) texlive/2016 2) StdEnv 3) opt 4) gcc/8.1.0 5) rocm/2.10 6) mvapich2/2.3
Could you try to enable devtoolset-7 according to the link below? https://www.softwarecollections.org/en/scls/rhscl/devtoolset-7/
According to ROCm GitHub https://github.com/RadeonOpenCompute/ROCm#supported-operating-systems • CentOS v7.7 (Using devtoolset-7 runtime support) • RHEL v7.7 (Using devtoolset-7 runtime support)
Thank you, I've forwarded your request to the admins, I'll let you know their answer.
@camierjs: Thank you! And I would appreciate if you could send me output from the following commands in your current environment: $ gcc --version $ ldd --version
$ gcc --version:
Reading specs from /usr/tce/packages/gcc/gcc-8.1.0/lib64/gcc/x86_64-pc-linux-gnu/8.1.0/specs
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/tce/packages/gcc/gcc-8.1.0/libexec/gcc/x86_64-pc-linux-gnu/8.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /builddir/build/BUILD/gccspack/spack/var/spack/stage/gcc-8.1.0-yf4dn5leietjepntgrnkv4syhgmb2nmm/gcc-8.1.0/configure --prefix=/usr/tce/packages/gcc/gcc-8.1.0 --libdir=/usr/tce/packages/gcc/gcc-8.1.0/lib64 --disable-multilib --enable-languages=c,obj-c++,c++,fortran,objc,go,lto --with-mpfr=/ --with-gmp=/usr --enable-lto --with-quad --with-sysroot=/ --with-stage1-ldflags='-Wl,-rpath,/usr/tce/packages/gcc/gcc-8.1.0/lib -Wl,-rpath,/usr/tce/packages/gcc/gcc-8.1.0/lib64 -Wl,-rpath,/lib -Wl,-rpath,/builddir/build/BUILD/gccspack/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/isl-0.18-a6bgwfhlamdrd6tbb7l6oonhnxruvlfh/lib -Wl,-rpath,/usr/lib -Wl,-rpath,/usr/tce/packages/binutils/binutils-2.30/lib -Wl,-rpath,/lib -Wl,-rpath,/lib64 -Wl,-rpath,/usr/lib64 -Wl,-rpath,/usr/tce/packages/binutils/binutils-2.30/lib64 -Wl,-rpath,/lib64 -static-libstdc++ -static-libgcc' --with-boot-ldflags='-Wl,-rpath,/usr/tce/packages/gcc/gcc-8.1.0/lib -Wl,-rpath,/usr/tce/packages/gcc/gcc-8.1.0/lib64 -Wl,-rpath,/lib -Wl,-rpath,/builddir/build/BUILD/gccspack/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/isl-0.18-a6bgwfhlamdrd6tbb7l6oonhnxruvlfh/lib -Wl,-rpath,/usr/lib -Wl,-rpath,/usr/tce/packages/binutils/binutils-2.30/lib -Wl,-rpath,/lib -Wl,-rpath,/lib64 -Wl,-rpath,/usr/lib64 -Wl,-rpath,/usr/tce/packages/binutils/binutils-2.30/lib64 -Wl,-rpath,/lib64 -static-libstdc++ -static-libgcc' --with-gnu-ld --with-gnu-as --with-ld=/usr/tce/packages/gcc/gcc-8.1.0/bin/ld --with-as=/usr/tce/packages/gcc/gcc-8.1.0/bin/as --with-mpc=/ --with-isl=/builddir/build/BUILD/gccspack/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/isl-0.18-a6bgwfhlamdrd6tbb7l6oonhnxruvlfh
Thread model: posix
gcc version 8.1.0 (GCC)
$ ldd --version:
ldd (GNU libc) 2.17
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
Could you try just to enable 'devtoolset-7', it might be already installed and it seems you don't need privilege access just to enable it: $ scl enable devtoolset-7 bash
Unable to open /etc/scl/conf/devtoolset-7!
There is no devtoolset-7
in the directory.
I see, so just please wait for your admins to respond. And thank you very much for trying!
You might also can consider to ask admins to install 'roctracer-dev' Linux package.
Ok, I'll do that: thank you for looking into this!
No problem, thank you!
Hi,
I waited that the software stack to be updated now to rocm/3.0
.
Unfortunately, I'm getting the same issue:
/opt/rocm/profiler/bin/rcprof -A ./MatrixTranspose
Radeon Compute Profiler V5.6.7262 is enabled
Device name Vega 20
aqlprofile API table load failed: HSA_STATUS_ERROR: A generic error has occurred.
I'll try rebuilding it from source as we did with 2.10
.
Hi, 'roctracer' has to be installed manually. So you can contact your admins to install 'roctracer-dev' Linux package or compile it from GitHub. To compile it you need 'devtoolset-7' on SLES/RHEL platforms. Also need pythonmodules: CppHeaderParser, argparse. To install: sudo pip install CppHeaderParser argparse
It is planned installing 'roctracer-dev' by default for one of future ROCm releases.
Thank you for all the answers, it's now more on our installation side to get up to date.
In case anyone else is bumping against this error aqlprofile API table load failed
, installing hsa-amd-aqlprofile
package seemed to solve it. This issue pops up when you search for aqlprofile API table load failed
. I only realised you can install that package because calling https://github.com/ROCm-Developer-Tools/rocprofiler/blob/207458f251f223803dbbce64821dde15107a1781/test/util/hsa_rsrc_factory.cpp#L131 in debug spits out the library name isn't found.
Mine was failing because ctrl was falling back onto trying to load HSA_EXTENSION_AMD_AQLPROFILE and then failing on that. This is where the HSA_STATUS_ERROR
was coming from which is being invoked from https://github.com/ROCm-Developer-Tools/rocprofiler/blob/207458f251f223803dbbce64821dde15107a1781/test/util/hsa_rsrc_factory.cpp#L133
Hi,
I am trying to profile some CEED benchmarks. I'm using a gfx906 card with rocm 2.10.
I'm using
hipcc
and the compilation/run seem fine, but I can't get any output from the profiler.I tried different options, but I keep getting this generic error:
Have you seen this kind of error?
Thank you for your help,
Jean-Sylvain