dendibakh / perf-book

The book "Performance Analysis and Tuning on Modern CPU"
https://book.easyperf.net/perf_book
Creative Commons Zero v1.0 Universal
2.12k stars 156 forks source link

Lally/marker api #20

Closed dendibakh closed 1 year ago

dendibakh commented 1 year ago

Hey @lally, for some reason I couldn't push changes into your branch (protected branch?), so I opened a new PR here. Previous PR is here: https://github.com/dendibakh/perf-book/pull/17.

dendibakh commented 1 year ago

@Lally, please review. Notice TODO: @Lally comments, feel free to edit and send your patches.

dendibakh commented 1 year ago

@lally, ping

lally commented 1 year ago

I've got a draft I'll put up this weekend for the TODOs

On Fri, May 5, 2023, 10:26 AM Denis Bakhvalov @.***> wrote:

@lally https://github.com/lally, ping

— Reply to this email directly, view it on GitHub https://github.com/dendibakh/perf-book/pull/20#issuecomment-1536344026, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAK4SZDM4CTWJ25LAOSUNDXEUEZHANCNFSM6AAAAAAW5T6RGI . You are receiving this because you were mentioned.Message ID: @.***>

lally commented 1 year ago

@dendibakh this is getting confusing. I couldn't push to the branch for this PR. I've pushed to this branch: https://github.com/lally/perf-book/tree/upstream/lally/marker-api

dendibakh commented 1 year ago

@lally, I pulled your changes and made some more edits. From my perspective, we are good to go. Please review one last time and let me know if you have any comments. Anything you would like to add/change?

lally commented 1 year ago

I'll put in 2 comments here instead of ping-ponging with branches. I think this is good.

  1. "It is possible to detect when the thread wasn't running by requesting CPU cycles consumed (UNHALTED_CORE_CYCLES, only counts when the thread is running) and comparing against wall-clock. " is missing a word. Perhaps "the wall-clock" or "wall-clock time?"

  2. The setup code in render() between pfm_initialize() and the declaration of read_format should really be called only once. Say in main() or some other setup_render_instrumentation() function? Something like:

static int s_event_fd = -1;
void setup_render_instrumentation() {
  if (s_event_fd < 0) {
   pfm_initialize();
   struct perf_event_attr perf_attr;
   memset(&perf_attr, 0, sizeof(perf_attr));
   perf_attr.size = sizeof(struct perf_event_attr);
   perf_attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED | 
                           PERF_FORMAT_TOTAL_TIME_RUNNING | PERF_FORMAT_GROUP;

   pfm_perf_encode_arg_t arg;
   memset(&arg, 0, sizeof(pfm_perf_encode_arg_t));
   arg.size = sizeof(pfm_perf_encode_arg_t);
   arg.attr = &perf_attr;

   pfm_get_os_event_encoding("instructions", PFM_PLM3, PFM_OS_PERF_EVENT_EXT, &arg);
   int leader_fd = perf_event_open(&perf_attr, 0, -1, -1, 0);
   pfm_get_os_event_encoding("cycles", PFM_PLM3, PFM_OS_PERF_EVENT_EXT, &arg);
   s_event_fd = perf_event_open(&perf_attr, 0, -1, leader_fd, 0);
   pfm_get_os_event_encoding("branches", PFM_PLM3, PFM_OS_PERF_EVENT_EXT, &arg);
   s_event_fd = perf_event_open(&perf_attr, 0, -1, leader_fd, 0);
   pfm_get_os_event_encoding("branch-misses", PFM_PLM3, PFM_OS_PERF_EVENT_EXT, &arg);
   s_event_fd = perf_event_open(&perf_attr, 0, -1, leader_fd, 0);
  }
}

struct read_format { uint64_t nr, time_enabled, time_running, values[4]; };

And then call setup_render_instrumentation() at the head of render() instead of all that code that's other now and use s_event_fd instead of event_fd in the read() calls.

dendibakh commented 1 year ago

I'll put in 2 comments here instead of ping-ponging with branches. I think this is good.

  1. "It is possible to detect when the thread wasn't running by requesting CPU cycles consumed (UNHALTED_CORE_CYCLES, only counts when the thread is running) and comparing against wall-clock. " is missing a word. Perhaps "the wall-clock" or "wall-clock time?"

Thanks. Will fix.

  1. The setup code in render() between pfm_initialize() and the declaration of read_format should really be called only once. Say in main() or some other setup_render_instrumentation() function? Something like:

Good catch. In this case, render() is called once, so it should be OK. But I will leave a note in the text.