Closed TomMelt closed 1 day ago
Add OMNITRACE_PAPI_EVENTS = amd64_rapl::RAPL_ENERGY_PKG
to a config file and you should see them in the trace timeline... assuming your machine has the privileges to read them (which it sounds like it does), but you should be able to verify that with: omnitrace-avail -H -r RAPL
Hi @jrmadsen , thanks a lot for the quick reply. I'll give this a go when I get back to the office and let you know. For now I will mark this issue as closed.
Hi @jrmadsen , I managed to get omnitrace installed on HPC with correct permissions. I have followed your instructions but I don't see anything in the trace (when I open in perfetto.ui).
I will include the command I run and the config.
omnitrace-instrument -o solver.inst -- ./bin/solver
omnitrace-run -- ./solver.inst 10 10000
The app is a simple openMP threaded application. Ideally I want to estimate the energy usage using the RAPL hw counter.
Am I doing something wrong?
below is my config:
1 # auto-generated by omnitrace-avail (version 1.10.0) on 2023-04-28 @ 12:15
2
3 OMNITRACE_CONFIG_FILE =
4 OMNITRACE_USE_PERFETTO = true
5 OMNITRACE_USE_TIMEMORY = true
6 OMNITRACE_USE_SAMPLING = false
7 OMNITRACE_USE_PROCESS_SAMPLING = true
8 OMNITRACE_USE_KOKKOSP = false
9 OMNITRACE_USE_CAUSAL = false
10 OMNITRACE_USE_MPIP = true
11 OMNITRACE_USE_PID = true
12 OMNITRACE_USE_RCCLP = false
13 OMNITRACE_OUTPUT_PATH = omnitrace-%tag%-output
14 OMNITRACE_OUTPUT_PREFIX =
15 OMNITRACE_CAUSAL_BACKEND = auto
16 OMNITRACE_CAUSAL_BINARY_EXCLUDE =
17 OMNITRACE_CAUSAL_BINARY_SCOPE = %MAIN%
18 OMNITRACE_CAUSAL_DELAY = 0
19 OMNITRACE_CAUSAL_DURATION = 0
20 OMNITRACE_CAUSAL_FUNCTION_EXCLUDE =
21 OMNITRACE_CAUSAL_FUNCTION_SCOPE =
22 OMNITRACE_CAUSAL_MODE = function
23 OMNITRACE_CAUSAL_RANDOM_SEED = 0
24 OMNITRACE_CAUSAL_SOURCE_EXCLUDE =
25 OMNITRACE_CAUSAL_SOURCE_SCOPE =
26 OMNITRACE_CRITICAL_TRACE = false
27 OMNITRACE_PAPI_EVENTS = amd64_rapl::RAPL_ENERGY_PKG
28 OMNITRACE_PERFETTO_BACKEND = inprocess
29 OMNITRACE_PERFETTO_BUFFER_SIZE_KB = 1024000
30 OMNITRACE_PERFETTO_FILL_POLICY = discard
31 OMNITRACE_PROCESS_SAMPLING_DURATION = -1
32 OMNITRACE_PROCESS_SAMPLING_FREQ = 0
33 OMNITRACE_SAMPLING_CPUS = 1
34 OMNITRACE_SAMPLING_DELAY = 0.5
35 OMNITRACE_SAMPLING_DURATION = 0
36 OMNITRACE_SAMPLING_FREQ = 300
37 OMNITRACE_SAMPLING_OVERFLOW_EVENT = perf::PERF_COUNT_HW_CACHE_REFERENCES
38 OMNITRACE_TIME_OUTPUT = true
39 OMNITRACE_TIMEMORY_COMPONENTS = wall_clock
40 OMNITRACE_TRACE_DELAY = 0
41 OMNITRACE_TRACE_DURATION = 0
42 OMNITRACE_TRACE_PERIOD_CLOCK_ID = CLOCK_REALTIME
43 OMNITRACE_TRACE_PERIODS =
44 OMNITRACE_VERBOSE = 0
45 OMNITRACE_ENABLED = true
46 OMNITRACE_SUPPRESS_CONFIG = false
47 OMNITRACE_SUPPRESS_PARSING = false
Set the OMNITRACE_USE_SAMPLING = true
and optionally increase/decrease the OMNITRACE_SAMPLING_FREQ
Thanks. Unfortunately I now get
omnitrace][305059] [timemory][papi] Warning!! Failure to add named event amd64_rapl::RAPL_ENERGY_PKG to event set 0 :: PAPI_error -1 : Invalid argument
Is it showing up in omnitrace-avail -H
?
yes (see output of omnitrace-avail -H -r RAPL
below)
FYI, I found this link which suggests I need to specify the cpu number e.g., amd64_rapl::RAPL_ENERGY_PKG:cpu=0
.
I tried and it runs without error but I don't have time to check if it's correct this evening. I will take a look tomorrow but ideally I want the whole processor not just one core.
$ omnitrace-avail -H -r RAPL
|-----------------------------------------|---------|-----------|----------------------------------------------------------------------|
| HARDWARE COUNTER | DEVICE | AVAILABLE | SUMMARY |
|-----------------------------------------|---------|-----------|----------------------------------------------------------------------|
| amd64_rapl::RAPL_ENERGY_PKG | CPU | true | Number of Joules consumed by all cores and Last level cache on the |
| | | | package |
| amd64_rapl::RAPL_ENERGY_PKG:u=0 | CPU | true | amd64_rapl::RAPL_ENERGY_PKG + monitor at user level |
| amd64_rapl::RAPL_ENERGY_PKG:k=0 | CPU | true | amd64_rapl::RAPL_ENERGY_PKG + monitor at kernel level |
| amd64_rapl::RAPL_ENERGY_PKG:period=0 | CPU | true | amd64_rapl::RAPL_ENERGY_PKG + sampling period |
| amd64_rapl::RAPL_ENERGY_PKG:freq=0 | CPU | true | amd64_rapl::RAPL_ENERGY_PKG + sampling frequency (Hz) |
| amd64_rapl::RAPL_ENERGY_PKG:excl=0 | CPU | true | amd64_rapl::RAPL_ENERGY_PKG + exclusive access |
| amd64_rapl::RAPL_ENERGY_PKG:mg=0 | CPU | true | amd64_rapl::RAPL_ENERGY_PKG + monitor guest execution |
| amd64_rapl::RAPL_ENERGY_PKG:mh=0 | CPU | true | amd64_rapl::RAPL_ENERGY_PKG + monitor host execution |
| amd64_rapl::RAPL_ENERGY_PKG:cpu=0 | CPU | true | amd64_rapl::RAPL_ENERGY_PKG + CPU to program |
| amd64_rapl::RAPL_ENERGY_PKG:pinned=0 | CPU | true | amd64_rapl::RAPL_ENERGY_PKG + pin event to counters |
|-----------------------------------------|---------|-----------|----------------------------------------------------------------------|
Ah, yeah you may just have to specify all the CPUs if you have multiple CPUs, e.g. OMNITRACE_PAPI_EVENTS = amd64_rapl::RAPL_ENERGY_PKG:cpu=0 amd64_rapl::RAPL_ENERGY_PKG:cpu=1 (etc.)
but I highly doubt the qualifier would be labeled "cpu" if it was actually per-core
@TomMelt have you gotten a chance to verify that adding the :cpu=X
qualifier provided the information you were seeking?
Hi @jrmadsen . It looks like it's similar to how omnitrace handles other CPU variables e.g., OMNITRACE_SAMPLING_CPUS
is actually core level if I understand correctly and not at a socket level.
So I would need to use :cpu=0
... :cpu=n
etc. if I have multiple threads.
However the result I get in omnitrace is either wrong or doing something weird. Would it be easier if we arranged a teams/zoom call at some point? It might be easier to troubleshoot/discuss.
Ideally I don't need the trace over time of energy usage but just the final value. Similar to the armforge perf-report.
Are you able to get energy usage from a simple program?
Hmmm... It's hard to tell if it is per core or not. Three of those bars look similar in magnitude when their samples are taken at overlapping timestamps -- those per-thread samples are taken with respect to the CPU-clock of the thread so it makes sense why they don't line up exactly.
I think for this particular use case, PAPI would ideally need to not initialize per-thread support and reading the counters should be done in the background "process sampling" thread instead of the per-thread interrupt sampler.
Before we hop on a call, let me experiment a bit with doing the above.
Hi @jrmadsen , did you have any luck?
Sorry for the delay, I started a long vacation right around when you posted the last comment.
I haven’t gotten a chance yet but I’ll look into it shortly.
@TomMelt Apologies for the lack of response. Do you still need assistance with this ticket? Thanks!
Hi @ppanchad-amd , thanks for replying. I have since moved onto another project. I never found a solution but it has been over a year since I last tried.
It's possible this issue has been resolved in a newer version of the tool, but I wouldn't be able to tell.
Seeming as I am not currently working on this, I will close this issue. If I get time to look back into it, I can always re-open this issue.
Hi,
ArmForge has a feature (perf-report) that can estimate power usage of a binary.
Is it possible to do something like this in omnitrace?
I had tried using AMDuProf but it is not supported on linux (see section 10.3 Limitations, p. 179). I raised an issue on the Community discussion forum.
I think it has something to do with the RAPL drivers. I can see some reference to them in the source, but I don't know how to use it.