Open spiermar opened 9 years ago
PCP already has two pmdas related to performance counters: papi and perfevent. The former is more automated out-of-the-box, except that it currently requires privileged access. The latter needs hand-configuration.
Hi! Is this being worked? Today I tried to set up on-cpu flame graphs with Vector with not much success. Digging through the mailing lists it looks like events type is not even supported in pmwebd, is that something that needs to be done prior to support in Vector?
The perfevents pmda (like the papi pmda) supplies normal scalar numeric sorts of metrics, which pmwebd happily relays. PCP events are a different sort of thing.
Ah, so perfevents does perf_type_hardware only and doesn't support complex sample types like callchain?
Correct.
At this point, there's a very rudimentary PMDA that allows users to generate flame graphs from Vector. In the future, I hope to have a more flexible PMDA that allows users not only to generate flame graphs, but also visualize more metrics from perf.
I see, that's what I am after. Since there's not much infrastructure built yet, I'll rather pursue getting BPF probes output (maps and collapsed+aggregated callstacks) to Vector instead. If I understand it right there's already a d3-based widget to display flamegraphs and heatmaps and having pseudo real-time data from probes is much more compelling.
Yes, I already have the d3 plugin ready and should be able to come up with a Vector widget pretty quickly, assuming I can get the right data format out of PCP. BPF probes would probably be the ideal solution at this point. If you're looking to pursue this route, I'm in. @brendangregg would probably be interested in this too.
@vavrusa Yes, I was looking into perf_events, but couldn't help but think it might be wasted effort when we'd prefer to do this with BPF. I'll (probably) start by updating the BPF/bcc profiler (@4ast's profiling support has now made it to net-next), so we have an example to borrow from.
Edit: we'll still want perf_events access for everything else, like PMCs, in Vector...
@brendangregg nice, ultimately what I want is the output from BPF probes (and profiler). It wasn't clear for me from the beginning if there's any support for that in pmc/pmwebd. I think, in the end, it might work better for me to spec what kind of data could Vector consume (tabular data from BPF map, heatmap, callstacks), and generate this data from edge applications directly/through proxy (e.g. NGINX/OpenResty reusing lj-bpf to monitor itself/its sockets) instead of having special daemon(s). I'm not sure how do you secure access to pmcd/pmwebd's on edge nodes at Netflix, if you could share some insight on that, that would fantastic. I think a gateway + OAuth + gw-browser encryption is bare minimum. Anyhow, I'll be on PTO for a few weeks, so that'd give me some time to think it through.
The format consumed by the flame graph d3 plugin is roughly this: https://github.com/spiermar/d3-flame-graph/blob/master/example/stacks.json
I also have a tool to convert the folded stacks into JSON, so Vector could consume that too.
For the other visualizations (heatmaps, etc) it depends.
I tried out Vector on my servers today. Pretty cool! @vavrusa I am also looking for a way to have BCC integration so that we could write custom widgets based on BPF maps data. Can you point me to the spec you are going to maintain? Maybe I can add some stuff to it too as I discover Vector more..
I'm trying to write a PMDA that interfaces with BPF. Has anyone else started working on this? I'm writing a PMDA as a python "library", which would allow me to use BCC directly (at least that's my idea). At the moment I'm stuck since all PMDAs run under the pcp user, which isn't allowed to hook onto kprobes. Although I could probably solve this with a workaround, I'm wondering if this is even the right approach. Any thoughts?
@brendangregg ^^
@tuxology AFAIK there's only a spec for callstacks as mentioned by @spiermar above. I didn't have much time to work on that further, mapping linear array to JSON array is pretty obvious, so is hashmap (with the exception of values). The call stack map would definitely deserve a schema.
My original idea was to have a library like Prometheus to serve contents of BPF maps on demand (as JSON), this way we can serve information both from non-privileged processes (e.g. observing what goes through sockets), and privileged processes (e.g. callstacks). I've kicked the PCP tires, but it's hard for me to justify managing another infrastructure, when we've already figured out how to get metrics through HTTP endpoints.
@mogeb PMDAs run as root by default, although many choose to switch to pcp user (voluntarily) via the set_user call (in python). If you drop that call from your script, you should find it runs as root.
@natoscott thanks! I hadn't even notice that.
New PCP PMDA that would be able to consume perf-events directly and respond to Vector, preferably via JSON. Will be required to support Flame Graphs.