ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
116 stars 44 forks source link

Issues when using rocprofiler kernel interception mode #45

Open wrwilliams opened 3 years ago

wrwilliams commented 3 years ago

With ROCm 4.0.1, I am seeing a SIGILL or SIGSEGV that appears to be at the point where a rocprofiler kernel dispatch callback should be invoked. This appears to be consistent behavior since at least 3.8.

Backtrace (unfortunately not terribly useful without debug symbols for libamdhip or librocprofiler):

>>> bt
#0  0x00007ffffffeab2a in ?? ()
#1  0x00007ffffffeac6d in ?? ()
#2  0x00002aaaacb50f9b in ?? () from /opt/rocm-4.0.1/rocprofiler/lib/librocprofiler64.so.1
#3  0x00002aaaacb5619d in ?? () from /opt/rocm-4.0.1/rocprofiler/lib/librocprofiler64.so.1
#4  0x00002aaaab6cf8be in ?? () from /opt/rocm-4.0.1/lib/libamdhip64.so.4
#5  0x00002aaaab6dc435 in ?? () from /opt/rocm-4.0.1/lib/libamdhip64.so.4
#6  0x00002aaaab6ccecc in ?? () from /opt/rocm-4.0.1/lib/libamdhip64.so.4
#7  0x00002aaaab6b22d9 in ?? () from /opt/rocm-4.0.1/lib/libamdhip64.so.4
#8  0x00002aaaab56ce96 in ?? () from /opt/rocm-4.0.1/lib/libamdhip64.so.4
#9  0x00002aaaab6b3cbf in ?? () from /opt/rocm-4.0.1/lib/libamdhip64.so.4
#10 0x00002aaaaacd6e65 in start_thread () from /lib64/libpthread.so.0
#11 0x00002aaaad07688d in clone () from /lib64/libc.so.6

The sequence of roctracer/rocprofiler calls is as follows:

<library init time>
    roctracer_set_properties( ACTIVITY_DOMAIN_HIP_API, NULL );
    roctracer_properties_t properties = { 0 };
    properties.buffer_size         = 0x1000;
    properties.buffer_callback_fun = scorep_hip_activity_callback;
    ROCTRACER_CALL( roctracer_open_pool( &properties ) );
    // note: roctracer callbacks are not registered in this build
<OnLoadTool and OnLoadToolProp>
    void* callback_data = NULL;
    rocprofiler_queue_callbacks_t cbs;
    cbs.dispatch = &dispatch_cb;
    ROCPROFILER_CALL( rocprofiler_set_queue_callbacks(cbs, callback_data) );

Environment variables for rocprofiler:

HSA_TOOLS_LIB=/opt/rocm-4.0.1/rocprofiler/lib/librocprofiler64.so.1
ROCP_METRICS=/opt/rocm-4.0.1/rocprofiler/lib/metrics.xml
ROCP_HSA_INTERCEPT=2
ROCP_TOOL_LIB=<path to library containing roctracer and rocprofiler code>

This matches the interception library test case in rocprofiler to the best of my knowledge.

With this environment, OnLoadTool is called. With any other environment I have tried, it is not. With the dispatch callback set at library load rather than via OnLoadTool, ROCP_TOOL_LIB is not necessary but with the other three variables set the same crash will occur. OnLoadToolProp is not called, and OnLoadTool is called once and only once. The crash occurs consistently at the point of the first kernel launch, and with a consistently similar-looking stack.

Equivalent rocprofiler code compiled and linked directly into the application worked fine for me, although I have not yet tested that case with a dummy roctracer set of calls (and roctracer link dependency) in addition.

ROCmSupport commented 3 years ago

Thanks @wrwilliams for reaching out. Can you please share the exact steps to reproduce the problem(step by step). And also please share the details of Asic, kernel version, ROCm version, outputs of /opt/rocm/bin/rocminfo and /opt/rocm/opencl/bin/clinfo. Thank you.

wrwilliams commented 3 years ago

Starting with the configuration details:

bash-4.2$ /opt/rocm/bin/rocminfo > rocminfo.out
bash-4.2$ /opt/rocm/opencl/bin/clinfo
ERROR: clGetPlatformIDs(-1001)

/proc/version:
Linux version 3.10.0-1062.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Wed Aug 7 18:08:02 UTC 2019

rocminfo.txt ROCm is version 4.0.1 as stated above. I'm not sure what ASIC details you're looking for that are not captured in the above; can you please specify further?

Step by step reproducer with an actual tool MWE:

  1. Build the Quicksilver benchmark for AMD GPU, with MPI disabled. This should be straightforward per their directions.
  2. Build the attached tool source with: hipcc break-rocprofiler.c -I/opt/rocm/include/rocprofiler -L/opt/rocm/lib -lrocprofiler64 --shared -fPIC -o break-rocprofiler.so
  3. export HSA_TOOLS_LIB=/opt/rocm/rocprofiler/lib/librocprofiler64.so
  4. export ROCP_METRICS=/opt/rocm/rocprofiler/lib/metrics.xml
  5. export ROCP_HSA_INTERCEPT=2
  6. export ROCP_TOOL_LIB=$PWD/break-rocprofiler.so
  7. ./qs This will print "init metrics", and then segfault.

(You'll need to rename break-rocprofiler.txt back to a .c file, because GitHub refuses to let me attach a source file to the issue.) break-rocprofiler.txt

This is not a perfect proxy for what we're doing in a full Score-P pipeline, but even in simplest form it appears to have the same failure mode. The failure is stable across building the library with GCC 9.3.0, Clang 9, and ROCm 4.0.1's hipcc. Note that I've been able to exclude roctracer from the MWE. This is modeled after the intercept_test.cpp test in this repository.

ROCmSupport commented 3 years ago

Thanks @wrwilliams for the steps. Am able to do from step2 to 6. But steps 1 and 7 are not clear. It will be good if you share details steps for the below 2. Step1: Build the Quicksilver benchmark for AMD GPU, with MPI disabled. This should be straightforward per their directions. Step7: ./qs --> This will print "init metrics", and then segfault.

Thank you.

wrwilliams commented 3 years ago

The Quicksilver benchmark may be found here: https://github.com/moes1/Quicksilver Clone the repository. Check out the AMD-HIP branch. Go into src/ and tweak the makefile such that a block like this is active (adjusted for local environment):

#####################################################################################
#hip, no MPI
#####################################################################################
CXX=hipcc
CXXFLAGS = -I$(HIP)/include/ -g -O2
CPPFLAGS = -DHAVE_HIP=1 -DMaxIt=15
LDFLAGS = -L$(HIP)/lib

This should already be present in the Makefile but most likely commented out; my local repo is as of commit ef799be7ddcb3def098f5fea2fe69d4cc78671d3.

make in src should produce a Quicksilver binary, qs, in the src directory. It may be run with no arguments (step 7), and that's sufficient to produce the crash deterministically for me when steps 2-6 are applied. When kernel interception is not enabled, Quicksilver runs to completion.

Hope that helps.

ROCmSupport commented 3 years ago

Thanks @wrwilliams I am able to reproduce this issue. Can you please share passing history for this issue. And also I need to find out whether its profiler issue or comes from any other component. Thank you.

wrwilliams commented 3 years ago

Can you please share passing history for this issue.

As in "has this ever worked"? Not to the best of my knowledge. If I'm misunderstanding what you need, please clarify.

And also I need to find out whether its profiler issue or comes from any other component.

If there are specific things you want me to try locally, let me know.

ROCmSupport commented 3 years ago

Yes @wrwilliams, I am looking for the answer "has this ever worked"?

wrwilliams commented 3 years ago

Then no; to the best of my knowledge the specific combination of:

has never worked.

ROCmSupport commented 3 years ago

Thanks @wrwilliams for more information. I will work with dev. Thank you.

wrwilliams commented 3 years ago

Just a quick update from my side: we've upgraded to ROCm 4.2 locally and are still able to reproduce this bug.

bertwesarg commented 2 years ago

OS on our failing nodes:


Distributor ID: CentOS
Description:    CentOS Linux release 7.7.1908 (Core)
Release:        7.7.1908
Codename:       Core
bertwesarg commented 2 years ago

And it is still broken on ROCm 4.3

ROCmSupport commented 2 years ago

Many profiler issues are there both internally and externally and so dev team is taking time to fix issues one by one. I will ping them once. Thank you.

kikimych commented 2 years ago

Checked with export ROCP_TOOL_LIB=/opt/rocm-4.5.0/rocprofiler/tool/libtool.so. It works. Problem in your init_metrics function: void init_metrics() { fprintf(stderr, "Init metrics\n"); void* callback_data = NULL; rocprofiler_queue_callbacks_t cbs; cbs.dispatch = &dispatch_cb; rocprofiler_set_queue_callbacks( cbs, callback_data ); } cbs variable is not initialized, some of rocprofiler_queue_callbacks_t.create() or rocprofiler_queue_callbacks_t.destroy() callbacks may be called somewhere. I expect sanity checks for zero in caller's context, and it depends from content of stack what happens after.

void callback_data = NULL; fails when compiled in debug mode, better use typed pointer or char according to strict aliasing rule.

hipcc requires target in rocm-4.5 This compile string works for me hipcc -O2 --amdgpu-target=gfx900 break-rocprofiler.c -I/opt/rocm/include/rocprofiler -I/opt/rocm-4.5.0/include/hsa -L/opt/rocm/lib -lrocprofiler64 --shared -fPIC -o break-rocprofiler.so

This is working example of init_metrics: void init_metrics() { fprintf(stderr, "Init metrics\n"); rocprofiler_callback_data_t* callback_data = NULL; rocprofiler_queue_callbacks_t cbs = {0}; cbs.dispatch = dispatch_cb; rocprofiler_set_queue_callbacks( cbs, callback_data ); }

wrwilliams commented 2 years ago

This is working example of init_metrics:

void init_metrics() 
{ 
  fprintf(stderr, "Init metrics\n"); 
  rocprofiler_callback_data_t* callback_data = NULL; 
  rocprofiler_queue_callbacks_t cbs = {0}; 
  cbs.dispatch = dispatch_cb; 
  rocprofiler_set_queue_callbacks( cbs, callback_data ); 
}

Thanks for the hint; this provides progress but not a fix with the MWE in my environment. The GPUs in the system use gfx906; I've of course adjusted both qs and the MWE .so CXXFLAGS for that.

What I now get consistently is that the first dispatch callback in Quicksilver succeeds, and after entry to the second, we get this:

In dispatch callback
Packets(0x2b0c312f0100, 1):
0, packet(0x2b0c312f0100):
  0x2b0c312f0100: 0x00000001
  0x2b0c312f0104: 0x00000000
  0x2b0c312f0108: 0x00000000
  0x2b0c312f010c: 0x00000000
  0x2b0c312f0110: 0x00000000
  0x2b0c312f0114: 0x00000000
  0x2b0c312f0118: 0x00000000
  0x2b0c312f011c: 0x00000000
  0x2b0c312f0120: 0x00000000
  0x2b0c312f0124: 0x00000000
  0x2b0c312f0128: 0x00000000
  0x2b0c312f012c: 0x00000000
  0x2b0c312f0130: 0x00000000
  0x2b0c312f0134: 0x00000000
  0x2b0c312f0138: 0x00000000
  0x2b0c312f013c: 0x00000000
error(41) "queue_event_callback(), queue(0x2b0c3127d000:0x2b0c312f0100)"
HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address.
./reproducer.sh: line 10: 44867 Aborted                 ./qs

I of course considered: what if callback_data cannot be a null pointer and thus ignored, but must instead be an actual struct? So I tested that:

rocprofiler_callback_data_t callback_data = { 0 }; 
void init_metrics()
{
...
  rocprofiler_set_queue_callbacks(cbs, &callback_data);
}

The result does not change. Also tested HSA_INTERCEPT=1 vs. HSA_INTERCEPT=2 with no change in behavior.

wrwilliams commented 2 years ago

Same behavior with 4.5.1.

Also of note since 4.5.x: the hsa-amd-aqlprofile package is necessary to get this MWE to function as described. Otherwise it fails with:

aqlprofile API table load failed: HSA_STATUS_ERROR: A generic error has occurred.

Is hsa-aql-profile still the correct package? If so, which meta package(s) should it be part of? If not, what is the up-to-date replacement?

kikimych commented 2 years ago

Could you please provide reproducer script for this issue?

Tested on gfx-900. Rocm version 4.5.x. Quicksilver commit 65a53818f0bb7669a691d87c5dbe53d16a99bf86. Interception library: break-rocprofiler-working.txt HSA_TOOLS_LIB=/opt/rocm/rocprofiler/lib/librocprofiler64.so ROCP_METRICS=/opt/rocm/rocprofiler/lib/metrics.xml ROCP_HSA_INTERCEPT=2 ROCP_TOOL_LIB=$QuicksilverPath/src/break-rocprofiler.so

compile string : hipcc break-rocprofiler.c -O0 -g3 -I/opt/rocm/include/rocprofiler -I/opt/rocm/include/hsa -L/opt/rocm/lib -lrocprofiler64 --shared -fPIC -o break-rocprofiler.so

You need to purge rocm from system completely and reinstall from scratch to met dependecies

Thank you.

wrwilliams commented 2 years ago

I can verify that this is fixed in 5.1 (tested on Oregon machines); I haven't gotten a chance to test with a good 5.0 environment yet but can see if Crusher is now able to build useful things with PrgEnv-amd and ROCm 5.0.