Closed dendibakh closed 1 year ago
@Lally, please review. Notice TODO: @Lally
comments, feel free to edit and send your patches.
@lally, ping
I've got a draft I'll put up this weekend for the TODOs
On Fri, May 5, 2023, 10:26 AM Denis Bakhvalov @.***> wrote:
@lally https://github.com/lally, ping
— Reply to this email directly, view it on GitHub https://github.com/dendibakh/perf-book/pull/20#issuecomment-1536344026, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAK4SZDM4CTWJ25LAOSUNDXEUEZHANCNFSM6AAAAAAW5T6RGI . You are receiving this because you were mentioned.Message ID: @.***>
@dendibakh this is getting confusing. I couldn't push to the branch for this PR. I've pushed to this branch: https://github.com/lally/perf-book/tree/upstream/lally/marker-api
@lally, I pulled your changes and made some more edits. From my perspective, we are good to go. Please review one last time and let me know if you have any comments. Anything you would like to add/change?
I'll put in 2 comments here instead of ping-ponging with branches. I think this is good.
"It is possible to detect when the thread wasn't running by requesting CPU cycles consumed (UNHALTED_CORE_CYCLES, only counts when the thread is running) and comparing against wall-clock. " is missing a word. Perhaps "the wall-clock" or "wall-clock time?"
The setup code in render()
between pfm_initialize()
and the declaration of read_format
should really be called only once. Say in main()
or some other setup_render_instrumentation()
function? Something like:
static int s_event_fd = -1;
void setup_render_instrumentation() {
if (s_event_fd < 0) {
pfm_initialize();
struct perf_event_attr perf_attr;
memset(&perf_attr, 0, sizeof(perf_attr));
perf_attr.size = sizeof(struct perf_event_attr);
perf_attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
PERF_FORMAT_TOTAL_TIME_RUNNING | PERF_FORMAT_GROUP;
pfm_perf_encode_arg_t arg;
memset(&arg, 0, sizeof(pfm_perf_encode_arg_t));
arg.size = sizeof(pfm_perf_encode_arg_t);
arg.attr = &perf_attr;
pfm_get_os_event_encoding("instructions", PFM_PLM3, PFM_OS_PERF_EVENT_EXT, &arg);
int leader_fd = perf_event_open(&perf_attr, 0, -1, -1, 0);
pfm_get_os_event_encoding("cycles", PFM_PLM3, PFM_OS_PERF_EVENT_EXT, &arg);
s_event_fd = perf_event_open(&perf_attr, 0, -1, leader_fd, 0);
pfm_get_os_event_encoding("branches", PFM_PLM3, PFM_OS_PERF_EVENT_EXT, &arg);
s_event_fd = perf_event_open(&perf_attr, 0, -1, leader_fd, 0);
pfm_get_os_event_encoding("branch-misses", PFM_PLM3, PFM_OS_PERF_EVENT_EXT, &arg);
s_event_fd = perf_event_open(&perf_attr, 0, -1, leader_fd, 0);
}
}
struct read_format { uint64_t nr, time_enabled, time_running, values[4]; };
And then call setup_render_instrumentation()
at the head of render()
instead of all that code that's other now and use s_event_fd
instead of event_fd
in the read()
calls.
I'll put in 2 comments here instead of ping-ponging with branches. I think this is good.
- "It is possible to detect when the thread wasn't running by requesting CPU cycles consumed (UNHALTED_CORE_CYCLES, only counts when the thread is running) and comparing against wall-clock. " is missing a word. Perhaps "the wall-clock" or "wall-clock time?"
Thanks. Will fix.
- The setup code in
render()
betweenpfm_initialize()
and the declaration ofread_format
should really be called only once. Say inmain()
or some othersetup_render_instrumentation()
function? Something like:
Good catch. In this case, render()
is called once, so it should be OK. But I will leave a note in the text.
Hey @lally, for some reason I couldn't push changes into your branch (protected branch?), so I opened a new PR here. Previous PR is here: https://github.com/dendibakh/perf-book/pull/17.