GPUOpen-Tools / radeon_compute_profiler

The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCL™ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.
MIT License
84 stars 19 forks source link

/opt/rocm/include/hip/hip_profile.h --> include <CXLActivityLogger.h> #1

Closed ptsant closed 7 years ago

ptsant commented 7 years ago

I am a bit lost. HIP seems to want the CXLActivityLogger file, but if I understand correctly, this is no longer part of the RCP.

Do I need to get CodeXL? Is this an error in the hip_profile header?

If it helps, the error occurs when I try to compile hipCaffe:

CXX src/caffe/common.cpp
In file included from src/caffe/common.cpp:7:
In file included from ./include/caffe/common.hpp:19:
In file included from ./include/caffe/util/device_alternate.hpp:39:
/opt/rocm/include/hip/hip_profile.h:31:10: fatal error: 'CXLActivityLogger.h' file not found
#include <CXLActivityLogger.h>
         ^~~~~~~~~~~~~~~~~~~~~
1 error generated.
Died at /opt/rocm/bin/hipcc line 452.
Makefile:624: recipe for target '.build_release/src/caffe/common.o' failed
make: *** [.build_release/src/caffe/common.o] Error 1

Code from hip_profile.h below:

#ifndef HIP_INCLUDE_HIP_HIP_PROFILE_H
#define HIP_INCLUDE_HIP_HIP_PROFILE_H

#if not defined (ENABLE_HIP_PROFILE)
#define ENABLE_HIP_PROFILE 1
#endif

#if defined(__HIP_PLATFORM_HCC__) and (ENABLE_HIP_PROFILE==1)
#include <CXLActivityLogger.h>
#define HIP_SCOPED_MARKER(markerName, group) amdtScopedMarker __scopedMarker(markerName, group, nullptr);
#define HIP_BEGIN_MARKER(markerName, group) amdtBeginMarker(markerName, group, nullptr);
#define HIP_END_MARKER() amdtEndMarker();
#else
#define HIP_SCOPED_MARKER(markerName, group)
#define HIP_BEGIN_MARKER(markerName, group)
#define HIP_END_MARKER()
#endif

#endif
chesik-amd commented 7 years ago

The CXLActivityLogger.h header file should be installed when you install rocm.

It should be available in /opt/rocm/profiler/CXLActivityLogger/include

However, it looks like 2 symlinks are missing in the current DEB and RPM packages:

Can you try creating these 2 symlinks and see if it makes a difference:

cd /opt/rocm
sudo ln -s ../../profiler/CXLActivityLogger/include include/profiler/CXLActivityLogger
sudo ln -s ../profiler/CXLActivityLogger/bin/x86_64/libCXLActivityLogger.so lib/libCXLActivityLogger.so

Let me know if this helps, and we will get the packages fixed for the next ROCm release

ptsant commented 7 years ago

Thanks for answering so quickly.

The CXLActivityLogger.{h,so} files are not on my system. Under /opt/rocm/profiler I have the following:

total 12
drwxr-xr-x 2 root root 4096 Jul 10 16:10 bin
drwxr-xr-x 2 root root 4096 Jul 10 16:10 counterfiles
drwxr-xr-x 2 root root 4096 Jul 10 16:10 jqPlot

This is also what I find in the rocm-profile packages:

hagakure archives # dpkg -x rocm-profiler_5.1.6386_amd64.deb /tmp/prof5.1
hagakure prof5.1 # ls -lR
.:
total 8
drwxrwxr-x 3 root root 4096 Jun 17 22:11 opt
drwxrwxr-x 3 root root 4096 Jun 17 22:11 usr

./opt:
total 4
drwxrwxr-x 6 root root 4096 Jun 17 22:11 rocm

./opt/rocm:
total 16
drwxrwxr-x 2 root root 4096 Jun 17 22:11 bin
drwxrwxr-x 3 root root 4096 Jun 17 22:11 include
drwxrwxr-x 2 root root 4096 Jun 17 22:11 lib
drwxrwxr-x 4 root root 4096 Jun 17 22:11 profiler

./opt/rocm/bin:
total 0
lrwxrwxrwx 1 root root 22 Jun 17 22:11 rocm-profiler -> ../profiler/bin/rcprof

./opt/rocm/include:
total 4
drwxrwxr-x 2 root root 4096 Jun 17 22:11 profiler

./opt/rocm/include/profiler:
total 0

./opt/rocm/lib:
total 0

./opt/rocm/profiler:
total 8
drwxr-xr-x 2 root root 4096 Jun 17 22:11 bin
drwxr-xr-x 2 root root 4096 Jun 17 22:11 jqPlot

./opt/rocm/profiler/bin:
total 47652
-rwxr-xr-x 1 root root 16582372 Jun 17 22:11 libGPUPerfAPICounters.so
-rwxr-xr-x 1 root root 16815926 Jun 17 22:11 libGPUPerfAPIHSA.so
-rwxr-xr-x 1 root root  3995862 Jun 17 22:11 libRCPHSAProfileAgent.so
-rwxr-xr-x 1 root root  5064308 Jun 17 22:11 libRCPHSATraceAgent.so
-rwxr-xr-x 1 root root     7708 Jun 17 22:11 libRCPPreloadXInitThreads.so
-rwxr-xr-x 1 root root  6319006 Jun 17 22:11 rcprof

./opt/rocm/profiler/jqPlot:
total 316
-rwxr-xr-x 1 root root  21698 Jun 17 22:11 excanvas.min.js
-rwxr-xr-x 1 root root   4810 Jun 17 22:11 jqplot.canvasAxisLabelRenderer.min.js
-rwxr-xr-x 1 root root  17993 Jun 17 22:11 jqplot.canvasTextRenderer.min.js
-rwxr-xr-x 1 root root   9373 Jun 17 22:11 jqplot.highlighter.min.js
-rwxr-xr-x 1 root root   3545 Jun 17 22:11 jquery.jqplot.min.css
-rwxr-xr-x 1 root root 158994 Jun 17 22:11 jquery.jqplot.min.js
-rwxr-xr-x 1 root root  91669 Jun 17 22:11 jquery.min.js

./usr:
total 4
drwxrwxr-x 3 root root 4096 Jun 17 22:11 share

./usr/share:
total 4
drwxrwxr-x 3 root root 4096 Jun 17 22:11 doc

./usr/share/doc:
total 4
drwxrwxr-x 2 root root 4096 Jun 17 22:11 rocm-profiler

./usr/share/doc/rocm-profiler:
total 4
-rw-r--r-- 1 root root 152 Jun 17 22:11 changelog.Debian.gz

I found the header and library in the older ROCm-profiler (not the RCP!) releases here and cloned it from there (https://github.com/RadeonOpenCompute/ROCm-Profiler.git). Using these files, I managed to compile almost everything, but then it fails at link time against a symbol that is in CXLActivityLogger.so:

ponos@hagakure:~/Documents/DeepLearn/hipCaffe$ make 
CXX/LD -o .build_release/tools/upgrade_net_proto_text.bin
/tmp/tmp.JKvsCkTjEM/dummy_data_layer.host.o: In function `caffe::Layer<float>::Forward(std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&, std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&)':
(.text+0x1d36): undefined reference to `amdtBeginMarker'
/tmp/tmp.JKvsCkTjEM/dummy_data_layer.host.o: In function `caffe::Layer<float>::Forward(std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&, std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&)':
(.text+0x1d50): undefined reference to `amdtEndMarker'
[.... many more of the same ....]

This happens even though the library is in the path and the symbol is provided. No idea why. Incompatibility?

hagakure lib # ldconfig -p | grep CXLA
    libCXLActivityLogger.so (libc6,x86-64) => /opt/rocm/lib/libCXLActivityLogger.so
hagakure lib # nm libCXLActivityLogger.so | grep Marker
000000000003a710 T amdtBeginMarker
000000000003b3b0 T amdtEndMarker
000000000003ae90 T amdtEndMarkerEx
000000000027ddc0 B g_perfMarkerItemMap
0000000000039870 T _Z17GetPerfMarkerItemPP14PerfMarkerItem
000000000002f0f0 T _ZN9FileUtils30GetDefaultPerfMarkerOutputFileEv
000000000003c900 W _ZNSt3mapImP14PerfMarkerItemSt4lessImESaISt4pairIKmS1_EEED1Ev
000000000003c900 W _ZNSt3mapImP14PerfMarkerItemSt4lessImESaISt4pairIKmS1_EEED2Ev
000000000003c5f0 W _ZNSt8_Rb_treeImSt4pairIKmP14PerfMarkerItemESt10_Select1stIS4_ESt4lessImESaIS4_EE16_M_insert_uniqueIS0_ImS3_EEES0_ISt17_Rb_tree_iteratorIS4_EbEOT_
000000000003c730 W _ZNSt8_Rb_treeImSt4pairIKmP14PerfMarkerItemESt10_Select1stIS4_ESt4lessImESaIS4_EE8_M_eraseEPSt13_Rb_tree_nodeIS4_E

Should I try to reinstall some package? Install RCP from github instead of the .deb package?

chesik-amd commented 7 years ago

Hmm, I'm not sure why you don't have the activitylogger files. They are supposed to be installed automatically when you install rocm (i.e. using "sudo apt-get install rocm").

Does running "sudo apt-get install cxlactivitylogger" help (you may still need to recreate the symlinks after doing this)?

For recent ROCm releases, the CXLActivityLogger fiels were split out of the rocm-profiler package and into their own package (called cxlactivitylogger).

The https://github.com/RadeonOpenCompute/ROCm-Profiler.git repo is obsolete at this point, and the version of cxlactivitylogger there is outdated. If you grab the RCP 5.1 release tarball, it will have the most recent version of the activitylogger: https://github.com/GPUOpen-Tools/RCP/releases/download/v5.1/RadeonComputeProfiler-v5.1.6396.tgz

You can also get the activitylogger sources directly from this repo: https://github.com/GPUOpen-Tools/common-src-AMDTActivityLogger

Chris

ptsant commented 7 years ago

I forced reinstall of the cxlactivitylogger package, which I probably somehow removed whilst debugging the installation, and then added the links you suggested. It worked just fine.

Thanks a lot!