GPUOpen-Tools / radeon_compute_profiler

The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCL™ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.
MIT License
84 stars 19 forks source link

Failed to generate profile result #25

Open alexfriman opened 5 years ago

alexfriman commented 5 years ago

I'm trying to profile an app and RCP shows "Failed to generate profile result /tmp/Session2.csv." I use it with: rcprof-d -o "/tmp/Session2.csv" -w "/home/alex/CPP_code/testrun/" -p -c "/home/alex/CPP_code/testrun/counters_OpenCL_gfx900_26751.txt" [myapp] strace shows this:

"[{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], 0, NULL) = 31785
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=31785, si_uid=1000, si_status=SIGSEGV, si_utime=84, si_stime=15} ---
readlink("/proc/self/exe", "/usr/bin/rcprof-d", 4095) = 17
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fee195e9e50) = 31807
wait4(31807, NULL, 0, NULL)             = 31807
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31807, si_uid=1000, si_status=255, si_utime=0, si_stime=0} ---
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fee19622e00}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fee19622e00}, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fee19622e00}, 8) = 0
openat(AT_FDCWD, "/tmp/Session2.csv", O_RDONLY) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "Failed to generate profile resul"..., 53Failed to generate profile result /tmp/Session2.csv.
) = 53
unlink("/home/alex/.rcpdata")           = 0
unlink("/home/alex/.rcpdata.amdtperfmarker") = 0
exit_group(11)                          = ?
+++ exited with 11 +++
"

The GPU is Vega 64, kernel version 4.20, amdgpu (mesa) 18.3

chesik-amd commented 5 years ago

It's hard to tell what might be going wrong from this info. If I'm reading the strace info correctly, it looks like the app being profiled is seg-faulting.

Assuming it runs ok when not using the profiler, and since you are using the debug version of the profiler (rcprof-d), try this (this is the same command line you gave above, but it first sets the GPUProfilerCheckForDebuggerAttach=true environment variable):

GPUProfilerCheckForDebuggerAttach=true rcprof-d -o "/tmp/Session2.csv" -w "/home/alex/CPP_code/testrun/" -p -c "/home/alex/CPP_code/testrun/counters_OpenCL_gfx900_26751.txt" [myapp]

With this env var set, the profiler will pause the application after the profiler library is loaded. You'll see a message saying "The application has been paused to allow a debugger to be attached to the process". It will also show the process id.

In a separate terminal window, execute "sudo gdb attach [pid]" using the process id shown. Once gdb attaches, press "c" to continue. Switch back to the terminal window where the profiler is running and press "y [enter]".

Hopefully, the SIGSEGV will occur and gdb will stop and you'll be able to see where the crash is occurring.

If you can share info about where the SIGSEGV is occurring, we can hopefully figure out what might be going wrong.

Thanks, Chris

alexfriman commented 5 years ago

Thank you Chris. Changing GPUProfilerCheckForDebuggerAttach did not help. I've ran it again but put strace before myapp. It can not find "/usr/lib/libatiadlxx.so". This library does not exist on my computer. At least find / -name libatiadlxx.so does not show anything. There is also a log file

[alex@desktop testrun]$ cat /home/alex/clprofileagent.log
Logging Started: Thu Mar  7 13:24:56 2019

Message: GPA: Logging callback registered successfully.
Message: GPA: Cannot get asicInfoList from ADL.
Delaunay commented 5 years ago

I have a similar issue output. The issue was reproducible using a simple python script provided here

rcprof -o result.atp -p -w . micro.py --network resnet18 --batch-size 32 --iterations 5
INFO: running forward and backward for warmup.
INFO: running the benchmark..
OK: finished running benchmark..
--------------------SUMMARY--------------------------
Microbenchmark for network : resnet18
Mini batch size [img] : 32
Time per mini-batch : 0.04629902839660645
Throughput [img/sec] : 691.1592123679529
Failed to generate profile result /home/mila/source/scratch/result.csv.

strace end output

[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6344
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=6344, si_uid=1000, si_status=0, si_utime=3826, si_stime=1622} ---
readlink("/proc/self/exe", "/home/mila/source/rcp/bin/rcprof", 4095) = 32
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fdff4f22a10) = 6416
wait4(6416, NULL, 0, NULL)              = 6416
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=6416, si_uid=1000, si_status=127, si_utime=0, si_stime=0} ---
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fdff3a0df20}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fdff3a0df20}, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fdff3a0df20}, 8) = 0
openat(AT_FDCWD, "/home/mila/source/scratch/result.csv", O_RDONLY) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
write(1, "Failed to generate profile resul"..., 72Failed to generate profile result /home/mila/source/scratch/result.csv.
) = 72
unlink("/home/mila/.rcpdata")           = 0
unlink("/home/mila/.rcpdata.amdtperfmarker") = 0
exit_group(0)                           = ?
+++ exited with 0 +++
Delaunay commented 5 years ago

Actually I have an even smaller example. It does not even use the GPU

> vi hello.py
#!/usr/bin/env python
print('hello')
> chmod +x hello.py
> rcprof -o result.atp -p -w . hello.py
hello
Failed to generate profile result /home/mila/source/scratch/result.csv.
JMadgwick commented 5 years ago

I can also confirm this problem. Was not happening when I last tested in March, so this is a regression. I have tried a few different GPU programs, 2 HIP and a HCC program and both have the same problem. Version used is "Radeon Compute Profiler V5.6.7262". Attempting to generate occupancy information is also impossible as it depends on performance counters and so also gives the same Failed to generate profile result

I also ran strace and the output is the same as in the comment above. rcprof appears to try and access the result but the file doesn't exist. Running grep on the trace reveals that no attempt is made to create the output file, only to read from it. I downloaded and ran the latest github release version and the result was the same.

daviddpruitt commented 5 years ago

I've run into the same problem using the examples in the HIP repository. The examples work properly on their own and complete but I get the same error message.

rcprof --hsapmc ./square.hip.out Radeon Compute Profiler V5.6.0 is enabled info: running on device Vega 20 info: allocate host mem ( 7.63 MB) info: allocate device mem ( 7.63 MB) info: copy Host2Device info: launch 'vector_square' kernel info: copy Device2Host info: check result PASSED! Failed to generate profile result /home/ddpruitt/Session1.csv.

rNoz commented 4 years ago

Any idea why is this failing in Windows? (tested both with 5.6 and 5.2)

It cannot generate neither .atp nor .csv files, giving the error "Failed to generate profile result ...".

I also tried giving the -o session.csv or -o C:\session.csv and nothing.

rur0 commented 3 years ago

I am also experiencing similar issue on linux

biergaizi commented 12 months ago

Same problem on Gentoo when using the official pre-built binaries. But the problem disappears after building rocprof locally from source, so I assume it's simply a library/binary compatibility problem.

However, be warned that compiling from source is not for the faint-hearted, I consider myself an experienced sysadmin but it took me 10 hours before I got a successful build. It seems that this software package is no longer maintained, there are many outdated identifiers, missing library includes, and other problems.

The final Boost linking undefined reference problem is the most tricky one, it turned out to be a C++ ABI compatibility problem because rcprof was building using the bundled Boost headers from Boost 1.59, but at link time I was using the system Boost library binaries, creating ABI conflicts. This is clearly not guaranteed to work. To fix that, there are two ways.

  1. The proper way. Change BOOST_DIR = $(COMMON_LIB_EXT)/Boost/boost_1_59_0 to the Boost source tree, the version should be the same as your system's version, for example BOOST_DIR = /home/user/boost_1_83_0/. If your system doesn't provide static .a files for Boost, just compile Boost from source, then set BOOST_LIB_DIR to be the build output of the same Boost version, such as BOOST_LIB_DIR = /home/user/boost_1_83_0/stage/lib. Otherwise, there will be undefined references to boost::filesystem::path_traits.

  2. The coincidence. My experiment showed that if Boost 1.76 is built using --std=c++11, linking against Boost 1.76 would work even if the source is compiled with Boost 1.59. But this is entirely a coincidence. If Boost is built using a higher C++ standard, there will be undefined reference to boost::system::generic_category().

Unfortunately, upon further investigation, even the "fixed" build still has problems. Only HSA profiling works, but not OpenCL or occupancy profiling. It turned out that rocprof is just a wrapper for various underlying modules, such as HSAFdnTrace, HSAFdnPMC, CLOccupancyAgent, CLProfileAgent, CLTraceAgent - all rocprof does is registers these modules and writes a configuration file before the user application runs, it's up to the modules themselves to do the actual work. This is why rocprof only reports Failed to generate profile result - it only reads an output file and has no idea about how the modules work by themselves - in this case, they're never executed.

For HSA and OpenCL, different loading mechanisms are used. For OpenCL and occupancy, it uses a method similar to LD_PRELOAD by replacing the OpenCL functions with instrumented versions. However, for reason unknown, the entry point is no longer invoked (possibility due to incompatible SDK change)? So now CLOccupancyAgent, CLProfileAgent, CLTraceAgent can no longer be invoked anywhere, their entry point cl_int CL_CALLBACK clAgent_OnLoad(cl_agent* agent) never executes.

Without someone who has a deeper understand about how the library preloading actually works, I'm afraid that this problem cannot be fixed. If anyone's interested and wishes to continue the investigation, reply to this thread and I can send a list of patches to replicate my build.