ROCm / omnitrace

Omnitrace: Application Profiling, Tracing, and Analysis
https://rocm.docs.amd.com/projects/omnitrace/en/latest/
MIT License
297 stars 27 forks source link

feature request - Energy profiling #273

Closed TomMelt closed 1 day ago

TomMelt commented 1 year ago

Hi,

ArmForge has a feature (perf-report) that can estimate power usage of a binary.

image

Is it possible to do something like this in omnitrace?

I had tried using AMDuProf but it is not supported on linux (see section 10.3 Limitations, p. 179). I raised an issue on the Community discussion forum.

I think it has something to do with the RAPL drivers. I can see some reference to them in the source, but I don't know how to use it.

source/docs/runtime.md:124:`amd64_rapl::RAPL_ENERGY_PKG`, `perf::PERF_COUNT_HW_CPU_CYCLES`, etc.
source/docs/runtime.md:698:| amd64_rapl::RAPL_ENERGY_PKG           | Number of Joules consumed by all c... |
source/docs/runtime.md:699:| amd64_rapl::RAPL_ENERGY_PKG:u=0       | amd64_rapl::RAPL_ENERGY_PKG + moni... |
source/docs/runtime.md:700:| amd64_rapl::RAPL_ENERGY_PKG:k=0       | amd64_rapl::RAPL_ENERGY_PKG + moni... |
source/docs/runtime.md:701:| amd64_rapl::RAPL_ENERGY_PKG:period=0  | amd64_rapl::RAPL_ENERGY_PKG + samp... |
source/docs/runtime.md:702:| amd64_rapl::RAPL_ENERGY_PKG:freq=0    | amd64_rapl::RAPL_ENERGY_PKG + samp... |
source/docs/runtime.md:703:| amd64_rapl::RAPL_ENERGY_PKG:excl=0    | amd64_rapl::RAPL_ENERGY_PKG + excl... |
source/docs/runtime.md:704:| amd64_rapl::RAPL_ENERGY_PKG:mg=0      | amd64_rapl::RAPL_ENERGY_PKG + moni... |
source/docs/runtime.md:705:| amd64_rapl::RAPL_ENERGY_PKG:mh=0      | amd64_rapl::RAPL_ENERGY_PKG + moni... |
source/docs/runtime.md:706:| amd64_rapl::RAPL_ENERGY_PKG:cpu=0     | amd64_rapl::RAPL_ENERGY_PKG + CPU ... |
source/docs/runtime.md:707:| amd64_rapl::RAPL_ENERGY_PKG:pinned=0  | amd64_rapl::RAPL_ENERGY_PKG + pin ... |
jrmadsen commented 1 year ago

Add OMNITRACE_PAPI_EVENTS = amd64_rapl::RAPL_ENERGY_PKG to a config file and you should see them in the trace timeline... assuming your machine has the privileges to read them (which it sounds like it does), but you should be able to verify that with: omnitrace-avail -H -r RAPL

TomMelt commented 1 year ago

Hi @jrmadsen , thanks a lot for the quick reply. I'll give this a go when I get back to the office and let you know. For now I will mark this issue as closed.

TomMelt commented 1 year ago

Hi @jrmadsen , I managed to get omnitrace installed on HPC with correct permissions. I have followed your instructions but I don't see anything in the trace (when I open in perfetto.ui).

I will include the command I run and the config.

omnitrace-instrument -o solver.inst -- ./bin/solver
omnitrace-run -- ./solver.inst 10 10000

The app is a simple openMP threaded application. Ideally I want to estimate the energy usage using the RAPL hw counter.

Am I doing something wrong?

below is my config:

  1 # auto-generated by omnitrace-avail (version 1.10.0) on 2023-04-28 @ 12:15
  2 
  3 OMNITRACE_CONFIG_FILE                              =
  4 OMNITRACE_USE_PERFETTO                             = true
  5 OMNITRACE_USE_TIMEMORY                             = true
  6 OMNITRACE_USE_SAMPLING                             = false
  7 OMNITRACE_USE_PROCESS_SAMPLING                     = true
  8 OMNITRACE_USE_KOKKOSP                              = false
  9 OMNITRACE_USE_CAUSAL                               = false
 10 OMNITRACE_USE_MPIP                                 = true
 11 OMNITRACE_USE_PID                                  = true
 12 OMNITRACE_USE_RCCLP                                = false
 13 OMNITRACE_OUTPUT_PATH                              = omnitrace-%tag%-output
 14 OMNITRACE_OUTPUT_PREFIX                            =
 15 OMNITRACE_CAUSAL_BACKEND                           = auto
 16 OMNITRACE_CAUSAL_BINARY_EXCLUDE                    =
 17 OMNITRACE_CAUSAL_BINARY_SCOPE                      = %MAIN%
 18 OMNITRACE_CAUSAL_DELAY                             = 0
 19 OMNITRACE_CAUSAL_DURATION                          = 0
 20 OMNITRACE_CAUSAL_FUNCTION_EXCLUDE                  =
 21 OMNITRACE_CAUSAL_FUNCTION_SCOPE                    =
 22 OMNITRACE_CAUSAL_MODE                              = function
 23 OMNITRACE_CAUSAL_RANDOM_SEED                       = 0
 24 OMNITRACE_CAUSAL_SOURCE_EXCLUDE                    =
 25 OMNITRACE_CAUSAL_SOURCE_SCOPE                      =
 26 OMNITRACE_CRITICAL_TRACE                           = false
 27 OMNITRACE_PAPI_EVENTS                              = amd64_rapl::RAPL_ENERGY_PKG
 28 OMNITRACE_PERFETTO_BACKEND                         = inprocess
 29 OMNITRACE_PERFETTO_BUFFER_SIZE_KB                  = 1024000
 30 OMNITRACE_PERFETTO_FILL_POLICY                     = discard
 31 OMNITRACE_PROCESS_SAMPLING_DURATION                = -1
 32 OMNITRACE_PROCESS_SAMPLING_FREQ                    = 0
 33 OMNITRACE_SAMPLING_CPUS                            = 1
 34 OMNITRACE_SAMPLING_DELAY                           = 0.5
 35 OMNITRACE_SAMPLING_DURATION                        = 0
 36 OMNITRACE_SAMPLING_FREQ                            = 300
 37 OMNITRACE_SAMPLING_OVERFLOW_EVENT                  = perf::PERF_COUNT_HW_CACHE_REFERENCES
 38 OMNITRACE_TIME_OUTPUT                              = true
 39 OMNITRACE_TIMEMORY_COMPONENTS                      = wall_clock
 40 OMNITRACE_TRACE_DELAY                              = 0
 41 OMNITRACE_TRACE_DURATION                           = 0
 42 OMNITRACE_TRACE_PERIOD_CLOCK_ID                    = CLOCK_REALTIME
 43 OMNITRACE_TRACE_PERIODS                            =
 44 OMNITRACE_VERBOSE                                  = 0
 45 OMNITRACE_ENABLED                                  = true
 46 OMNITRACE_SUPPRESS_CONFIG                          = false
 47 OMNITRACE_SUPPRESS_PARSING                         = false
jrmadsen commented 1 year ago

Set the OMNITRACE_USE_SAMPLING = true and optionally increase/decrease the OMNITRACE_SAMPLING_FREQ

TomMelt commented 1 year ago

Thanks. Unfortunately I now get

omnitrace][305059] [timemory][papi] Warning!! Failure to add named event amd64_rapl::RAPL_ENERGY_PKG to event set 0 :: PAPI_error -1 : Invalid argument
jrmadsen commented 1 year ago

Is it showing up in omnitrace-avail -H?

TomMelt commented 1 year ago

yes (see output of omnitrace-avail -H -r RAPL below)

FYI, I found this link which suggests I need to specify the cpu number e.g., amd64_rapl::RAPL_ENERGY_PKG:cpu=0.

I tried and it runs without error but I don't have time to check if it's correct this evening. I will take a look tomorrow but ideally I want the whole processor not just one core.

$ omnitrace-avail -H -r RAPL
|-----------------------------------------|---------|-----------|----------------------------------------------------------------------|
|            HARDWARE COUNTER             | DEVICE  | AVAILABLE |                               SUMMARY                                |
|-----------------------------------------|---------|-----------|----------------------------------------------------------------------|
| amd64_rapl::RAPL_ENERGY_PKG             |   CPU   |   true    | Number of Joules consumed by all cores and Last level cache on the   |
|                                         |         |           |   package                                                            |
| amd64_rapl::RAPL_ENERGY_PKG:u=0         |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + monitor at user level                  |
| amd64_rapl::RAPL_ENERGY_PKG:k=0         |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + monitor at kernel level                |
| amd64_rapl::RAPL_ENERGY_PKG:period=0    |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + sampling period                        |
| amd64_rapl::RAPL_ENERGY_PKG:freq=0      |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + sampling frequency (Hz)                |
| amd64_rapl::RAPL_ENERGY_PKG:excl=0      |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + exclusive access                       |
| amd64_rapl::RAPL_ENERGY_PKG:mg=0        |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + monitor guest execution                |
| amd64_rapl::RAPL_ENERGY_PKG:mh=0        |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + monitor host execution                 |
| amd64_rapl::RAPL_ENERGY_PKG:cpu=0       |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + CPU to program                         |
| amd64_rapl::RAPL_ENERGY_PKG:pinned=0    |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + pin event to counters                  |
|-----------------------------------------|---------|-----------|----------------------------------------------------------------------|
jrmadsen commented 1 year ago

Ah, yeah you may just have to specify all the CPUs if you have multiple CPUs, e.g. OMNITRACE_PAPI_EVENTS = amd64_rapl::RAPL_ENERGY_PKG:cpu=0 amd64_rapl::RAPL_ENERGY_PKG:cpu=1 (etc.) but I highly doubt the qualifier would be labeled "cpu" if it was actually per-core

jrmadsen commented 1 year ago

@TomMelt have you gotten a chance to verify that adding the :cpu=X qualifier provided the information you were seeking?

TomMelt commented 1 year ago

Hi @jrmadsen . It looks like it's similar to how omnitrace handles other CPU variables e.g., OMNITRACE_SAMPLING_CPUS is actually core level if I understand correctly and not at a socket level.

So I would need to use :cpu=0 ... :cpu=n etc. if I have multiple threads.

However the result I get in omnitrace is either wrong or doing something weird. Would it be easier if we arranged a teams/zoom call at some point? It might be easier to troubleshoot/discuss.

image

TomMelt commented 1 year ago

Ideally I don't need the trace over time of energy usage but just the final value. Similar to the armforge perf-report.

Are you able to get energy usage from a simple program?

jrmadsen commented 1 year ago

Hmmm... It's hard to tell if it is per core or not. Three of those bars look similar in magnitude when their samples are taken at overlapping timestamps -- those per-thread samples are taken with respect to the CPU-clock of the thread so it makes sense why they don't line up exactly.

I think for this particular use case, PAPI would ideally need to not initialize per-thread support and reading the counters should be done in the background "process sampling" thread instead of the per-thread interrupt sampler.

Before we hop on a call, let me experiment a bit with doing the above.

TomMelt commented 1 year ago

Hi @jrmadsen , did you have any luck?

jrmadsen commented 1 year ago

Sorry for the delay, I started a long vacation right around when you posted the last comment.

I haven’t gotten a chance yet but I’ll look into it shortly.

ppanchad-amd commented 1 week ago

@TomMelt Apologies for the lack of response. Do you still need assistance with this ticket? Thanks!

TomMelt commented 1 day ago

Hi @ppanchad-amd , thanks for replying. I have since moved onto another project. I never found a solution but it has been over a year since I last tried.

It's possible this issue has been resolved in a newer version of the tool, but I wouldn't be able to tell.

Seeming as I am not currently working on this, I will close this issue. If I get time to look back into it, I can always re-open this issue.