ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
116 stars 44 forks source link

Rocprofiler does not allow to change metrics when using intercept mode #71

Open gcongiu opened 2 years ago

gcongiu commented 2 years ago

Currently, rocprofiler does not allow to change metrics at runtime for intercepted kernels, so the following example won't work:

  rocprofiler_feature_t features[4];
  features[0].kind = ROCPROFILER_FEATURE_KIND_METRIC;
  features[0].name = "SQ_WAVES";
  unsigned feature_count = 1;

  init_intercept(features, feature_count);
  start_intercept();

  hipLaunchKernelGGL(vectoradd_float,
                  dim3(WIDTH/THREADS_PER_BLOCK_X, HEIGHT/THREADS_PER_BLOCK_Y),
                  dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y),
                  0, 0,
                  deviceA ,deviceB ,deviceC ,WIDTH ,HEIGHT);

  hipDeviceSynchronize();
  stop_intercept();
  shutdown_intercept();

  features[1].kind = ROCPROFILER_FEATURE_KIND_METRIC;
  features[1].name = "SQ_INSTS_VALU";
  feature_count += 1;

  init_intercept(features, feature_count);
  start_intercept();

  hipLaunchKernelGGL(vectoradd_float,
                  dim3(WIDTH/THREADS_PER_BLOCK_X, HEIGHT/THREADS_PER_BLOCK_Y),
                  dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y),
                  0, 0,
                  deviceA ,deviceB ,deviceC ,WIDTH ,HEIGHT);

  hipDeviceSynchronize();
  stop_intercept();
  shutdown_intercept();

Above, init_intercept() initializes the queue callbacks for intercept mode and calls rocprofiler_set_queue_callbacks(). start_intercept() and stop_intercept() call rocprofiler_start_queue_callbacks() and rocprofiler_stop_queue_callbacks(), respectively, and shutdown_intercept() calls rocprofiler_remove_queue_callbacks().

Rocprofiler does not allow users to call rocprofiler_set_queue_callbacks() if this has been already called. Thus, the second call to init_intercept() in the example code above causes the following error message:

> error(4096) "SetCallbacks(), reassigning queue callbacks - not supported”

The ability to change metrics at runtime (while using intercept mode) is a feature highly desirable for tools like PAPI. With the current implementation of rocprofiler PAPI users would have to define metrics once and have them applied to all the kernels being intercepted.

gcongiu commented 2 years ago

Example source code: vectoradd_hip.cpp.txt

kikimych commented 2 years ago

Could you please check small fix? https://github.com/ROCm-Developer-Tools/rocprofiler/pull/77 Value of feature_count in https://github.com/ROCm-Developer-Tools/rocprofiler/files/7933103/vectoradd_hip.cpp.txt. can't be greater than actual number of features in vector.

gcongiu commented 2 years ago

@kikimych I will check it. Thanks

gcongiu commented 2 years ago

Previously I was getting:

LD_LIBRARY_PATH=$HOME/rocm-4.5.0/rocprofiler/lib:$LD_LIBRARY_PATH ./vectoradd_hip.exe
 System minor 0
 System major 9
 agent prop name
hip Device prop succeeded
kernel symbol(0x7f051040a800) name("vectoradd_float(float*, float const*, float const*, int, int) [clone .kd]") tid(67149) queue-id(0) gpu-id(0) > SQ_WAVES = (16384)
kernel symbol(0x7f051040a800) name("vectoradd_float(float*, float const*, float const*, int, int) [clone .kd]") tid(67149) queue-id(0) gpu-id(0) > SQ_WAVES = (16384)
kernel symbol(0x7f051040a800) name("vectoradd_float(float*, float const*, float const*, int, int) [clone .kd]") tid(67149) queue-id(0) gpu-id(0) > SQ_WAVES = (16384)
kernel symbol(0x7f051040a800) name("vectoradd_float(float*, float const*, float const*, int, int) [clone .kd]") tid(67149) queue-id(0) gpu-id(0) > SQ_WAVES = (16384)
error(4096) "SetCallbacks(), reassigning queue callbacks - not supported"
HSA_STATUS_ERROR: A generic error has occurred.
Aborted (core dumped)

Now, with the fix that sets callback_data_ = NULL I am getting:

$ LD_LIBRARY_PATH=$HOME/rocm-4.5.0/rocprofiler/lib:$LD_LIBRARY_PATH ./vectoradd_hip.exe
 System minor 0
 System major 9
 agent prop name
hip Device prop succeeded
kernel symbol(0x7f5dfe40a800) name("vectoradd_float(float*, float const*, float const*, int, int) [clone .kd]") tid(67314) queue-id(0) gpu-id(0) > SQ_WAVES = (16384)
kernel symbol(0x7f5dfe40a800) name("vectoradd_float(float*, float const*, float const*, int, int) [clone .kd]") tid(67314) queue-id(0) gpu-id(0) > SQ_WAVES = (16384)
kernel symbol(0x7f5dfe40a800) name("vectoradd_float(float*, float const*, float const*, int, int) [clone .kd]") tid(67314) queue-id(0) gpu-id(0) > SQ_WAVES = (16384)
kernel symbol(0x7f5dfe40a800) name("vectoradd_float(float*, float const*, float const*, int, int) [clone .kd]") tid(67314) queue-id(0) gpu-id(0) > SQ_WAVES = (16384)
vectoradd_hip.exe: vectoradd_hip.cpp:160: hsa_status_t _rocp_dispatch_callback(const rocprofiler_callback_data_t *, void *, rocprofiler_group_t *): Assertion `status == HSA_STATUS_SUCCESS' failed.
Aborted (core dumped)
kikimych commented 2 years ago

You are adding 2 features to profile, but setting feature count to 1. I have fixed it locally, but forgot to report. Could you please check with feature_count set to number of actual features?

gcongiu commented 2 years ago

The feature_count=1 was intended not a typo. However, if I set it to 2 instead, start/stop monitoring, then change it back to 1 and start/stop monitoring again, I get the same error(4096) "SetCallbacks(), reassigning queue callbacks - not supported" I used to see.

kikimych commented 2 years ago

My bad. It's not related to feature_count. You have a typo in metric name. "SQ_INSTS_VALU", not "SQ_INST_VALU".

gcongiu commented 1 year ago

[Update] Still not resolved with ROCm 5.4.x:

$ ./kernel-intercept
Tool lib "/home/gcongiu/rocprofiler/build/librocprofiler64.so" failed to load.
 System minor 0
 System major 9
 agent prop name AMD Instinct MI210
hip Device prop succeeded
error(4096) "SetCallbacks(), reassigning queue callbacks - not supported"
HSA_STATUS_ERROR: A generic error has occurred.
Aborted (core dumped)