[New Feature]: Falco native resource utilization metrics logs support

incertum commented 1 year ago

Motivation

Support for "Falco native resource utilization metrics" is high up on the wish list of SREs I have the pleasure to work with. While many end users sustaining a large deployment already pull such metrics from their systems using other mechanisms, there is always some loss of information. In addition, it can be cumbersome to join information from different sources and typically specialized metrics are not supported.

Falco could very easily emit basic aggregated resource utilization metrics scheduled on a cron tab (or simpler alternative similar to existing methods). Additional overhead should be low since many metrics are already available.

Finally, it would make it easier to perform ad-hoc performance studies, especially as it seems LSM hooks are favored as additional event sources for the next Falco iterations. That way both tool developers and end users can optimize the tool better for threat detection use cases and derive SLO (Service Level Objectives) that can be the basis for resource overhead budgeting. Such logs with more specialized metrics can also help disentangling factors that can cause higher resource utilization that are however outside of the tool developer's influence, such as hardware, kernel version or actual workload footprint (event rate).

Feature

[Edited May 23, 2023]:

Consider these key points about the new metrics feature in Falco:

It introduces a redesigned stats/metrics system.
Native support for resource utilization metrics and specialized performance metrics.
Metrics are emitted as monotonic counters at predefined intervals (snapshots).
All metrics are consolidated into a single log message, adhering to the established rules schema and naming conventions.
Additional info fields complement the metrics and facilitate customized statistical analyses and correlations.
The metrics framework is designed for easy future extension.

Additional highlights:

Includes old / established stats / counters in one place
For bpf drivers we now support libbpf stats
More extensions (syscall counters and prometheus exporter option) are planned

Navigate to the metrics key in https://github.com/falcosecurity/falco/blob/master/falco.yaml.

Example metrics snapshot schema for Falco 0.35 release using bpf driver:

{
  "hostname": "x",
  "output": "Falco metrics snapshot",
  "output_fields": {
    "evt.source": "syscall",
    "evt.time": 1684842387420399900,
    "falco.container_memory_used": 0.0,
    "falco.cpu_usage_perc": 1.500000,
    "falco.duration_sec": 26,
    "falco.evts_rate_sec": 213.677443,
    "falco.host_boot_ts": 1682760026000000000,
    "falco.host_num_cpus": 20,
    "falco.hostname": "x",
    "falco.kernel_release": "x",
    "falco.memory_pss": 60.994141,
    "falco.memory_rss": 64.152344,
    "falco.memory_vsz": 1112.585938,
    "falco.num_evts": 244792,
    "falco.num_evts_prev": 244384,
    "falco.start_ts": 1684842361326862695,
    "falco.version": "0.34.1-268+8c5ebde",
    "scap.engine_name": "bpf",
    "scap.evts_drop_rate_sec": 0,
    "scap.evts_rate_sec": 190.633797,
    "scap.n_drops": 0,
    "scap.n_drops_buffer_clone_fork_enter": 0,
    "scap.n_drops_buffer_clone_fork_exit": 0,
    "scap.n_drops_buffer_connect_enter": 0,
    "scap.n_drops_buffer_connect_exit": 0,
    "scap.n_drops_buffer_dir_file_enter": 0,
    "scap.n_drops_buffer_dir_file_exit": 0,
    "scap.n_drops_buffer_execve_enter": 0,
    "scap.n_drops_buffer_execve_exit": 0,
    "scap.n_drops_buffer_open_enter": 0,
    "scap.n_drops_buffer_open_exit": 0,
    "scap.n_drops_buffer_other_interest_enter": 0,
    "scap.n_drops_buffer_other_interest_exit": 0,
    "scap.n_drops_buffer_total": 0,
    "scap.n_drops_bug": 0,
    "scap.n_drops_page_faults": 0,
    "scap.n_drops_perc": 0.000000,
    "scap.n_drops_prev": 0,
    "scap.n_drops_scratch_map": 0,
    "scap.n_evts": 245082,
    "scap.n_evts_prev": 244718,
    "scap.page_fault_kern.avg_time_ns": 0,
    "scap.page_fault_kern.run_cnt": 0,
    "scap.page_fault_kern.run_time_ns": 0,
    "scap.page_fault_user.avg_time_ns": 0,
    "scap.page_fault_user.run_cnt": 0,
    "scap.page_fault_user.run_time_ns": 0,
    "scap.sched_process_e.avg_time_ns": 1772,
    "scap.sched_process_e.run_cnt": 45,
    "scap.sched_process_e.run_time_ns": 79757,
    "scap.sched_switch.avg_time_ns": 0,
    "scap.sched_switch.run_cnt": 0,
    "scap.sched_switch.run_time_ns": 0,
    "scap.signal_deliver.avg_time_ns": 0,
    "scap.signal_deliver.run_cnt": 0,
    "scap.signal_deliver.run_time_ns": 0,
    "scap.sys_enter.avg_time_ns": 84,
    "scap.sys_enter.run_cnt": 560929,
    "scap.sys_enter.run_time_ns": 47409167,
    "scap.sys_exit.avg_time_ns": 97,
    "scap.sys_exit.run_cnt": 559949,
    "scap.sys_exit.run_time_ns": 54803511
  },
  "priority": "Informational",
  "rule": "Falco internal: metrics snapshot",
  "source": "internal",
  "time": "2023-05-23T11:46:27.420399900Z"
}

Alternatives

End users can continue using their own mechanisms to pull Falco's resource utilization metrics or specialized metrics.

jasondellaluce commented 1 year ago

Thanks for reporting and tracking this! For more context, Falco has two options that allow something similar to this:

-s: if specified, append statistics related to Falco's reading/processing of events to this file (only useful in live mode)
--stats-interval: When using -s , write statistics every ms. This uses signals, so don't recommend intervals below 200 ms. Defaults to 5000 (5 seconds)

The output of this will be something like this in the upcoming Falco 0.33 (last update in https://github.com/falcosecurity/falco/pull/2182):

{
  "sample": 71,
  "k8s_audit": {
    "cur": {
      "events": 1
    },
    "delta": {
      "events": 1
    }
  },
  "syscall": {
    "cur": {
      "drop_pct": 0,
      "drops": 0,
      "events": 9525,
      "preemptions": 0
    },
    "delta": {
      "drop_pct": 0,
      "drops": 0,
      "events": 137,
      "preemptions": 0
    }
  }
}

So I think possible next steps for this are:

Support contab-like notation in the --stats-interval as alternative to the ms integer value
Create clean interface and types inside libsinsp/scap for gathering metrics, which would also involves the plugin API
Consider merging the current scap_stats in the new metrics interface too
Decide on one (or more) standard metrics output. Json works, but for instance a prometheus endpoint may be one of the new ones (https://github.com/falcosecurity/falco/issues/1772)

jasondellaluce commented 1 year ago

cc @leogr

incertum commented 1 year ago

@jasondellaluce awesome this sounds like a lot of really great options and certainly calls for a good amount of collaboration and needing multiple PRs to get everything in place. Maybe worth creating a new umbrella issue for "stats" to have one place for tracking?

Agreed JSON is a good starting point and more can be supported later as new endpoints are added. This is exciting.

Re cron syntax support I think this would be really nice, but it can be tracked as "nice to have", some presets like every hour, every minute etc would work fine at the beginning as well. To me it would be intuitive to support multiple stats categories like some end users may not need these utilization metrics and only want to log other metrics vs for example for us this is the most important measurement to be able to decide if the tool meets our budgeting requirements. Certain that's what you had in mind 🙃 with the re-design and more.

Could start looking into new APIs/methods to get the metrics we don't yet have in libs and maybe would you want to start a new metrics interface design since you know the project so well?

jasondellaluce commented 1 year ago

Yeah definitely. Will start researching a good way to standardize metrics in our codebase after the upcoming release.

incertum commented 1 year ago

Amazing, thanks a bunch @jasondellaluce!

incertum commented 1 year ago

Hi @jasondellaluce starting to look into the parts I signed myself up for (getting the math functions in place to derive / snapshot the metrics end users can emit on a cron tab).

Would it be possible to first discuss a few details here? There are multiple ways to derive CPU and memory usage, should we also feature multiple metrics in this regard?

%CPU ps vs top approach like nicely summarizes in this post? Both or favor ps like calculation?
For memory similarly, using %mem like in ps command, and other ones like rss,vsize,size?

@gnosek would it be ok to also ask you for some feedback in this regard? Would appreciate your thoughts on this a lot :pray: .

gnosek commented 1 year ago

Since you mention cron, I assume you don't want to poll them every 100 ms :) so maybe (just maybe) the prometheus exporter approach would work? Your cron command would just be a curl then.

Maybe we can extend --stats-interval to accept a zero interval (or even default to it when only -s is specified), meaning "whenever I get the signal", so then your cron command would be e.g. killall -FOO falco.

As you can probably guess from the above, I'm not really a fan of adding a cron-type scheduler to falco for this. You're already running cron, no reason to reinvent the wheel IMO.

For not reinventing the wheel further, I'd drop the CPU/memory metrics altogether (leaving only the falco-specific data). There are tons of tools to monitor these already. If we do want to do this ourselves, I'd prefer to keep the calculations simple, i.e. either:

return the CPU time in jiffies (1/100 sec) since startup (plus process uptime in jiffies to enable the calculation) or
return the percentage (cpu_time_in_jiffies / uptime_in_seconds)

I'd very much prefer not to do the delta calculation in falco (i.e. do it the ps way, not top). If we expose the two raw metrics (cpu time, uptime), it's very easy for whoever consumes the metrics to calculate their own deltas and it's independent of --stats-interval. And AFAIK it's easier to work with monotonic counters in promql, compared to ratios.

For memory, I'd probably only return the raw values too (rss_kb, vsize_kb would be the two obvious ones).

tl;dr: I'd rather expose the absolute minimum of stats as a prometheus exporter and do the fancy in promql (but I am all for adding new falco-specific metrics, e.g. in-kernel cpu overhead if we can determine it)

incertum commented 1 year ago

@gnosek this is excellent feedback and what I was looking for to have a more concrete idea of what could be reasonable, meaning it strikes a balance of having these metrics "natively supported" while not going crazy on host either :). Couldn't agree more: Doing the remainder of calculation in promql or in your SQL-like engine you have available in your post processing compute platform (in case it's not prometheus) is cheap either way.

Re --stats-interval yes open to options you all think more naturally fit into Falco's existing approach. Having the option to emit such metrics according to presets (e.g. every hour, every 4h, 12h or 24h) should suffice IMO, proper cron tab is not necessarily needed.

Additional question (more forward leaning for a v2 or v3 of such metrics after knowing if v1 is useful):

As eBPF evolves, there could be interesting ways to monitor eBPF perf better, such as measuring the average time spent in each bpf and trying to optimize for numbers reflecting "faster" as optimizations are added. At the moment bpftool is not too granular, meaning no tail-call resolution of stats, but hopefully the tool evolves too :)

Could anyone think of ways how we could bundle or master bpftool into Falco, or is this a bit too crazy for the time being?

# https://www.mankier.com/8/bpftool-prog
# settings for bpftool
sysctl kernel.bpf_stats_enabled=1 || true
/usr/bin/bpftool --json --pretty prog show | jq ' .[] | select(.run_time_ns | length >= 1) | {run_time_ns: .run_time_ns, run_cnt: .run_cnt}' | jq -cs . > /tmp/falco-perf

Ah and maybe additional insights into the motivation that drives the "native support" of CPU and memory performance metrics parts of this feature request could be useful for anyone reading this:

For example, we are fortunate enough to have large infrastructure and SREs teams to already have proper metrics in place over prometheus. In practice, the major overhead appears to be sustaining different data pipelines or intermediary brokers to forward performance metrics to where you (person who deploys and maintains Falco in production) would like to have this data available or preserved for custom correlations (-> to get to the bottom of perf overhead <-> detection capabilities tradeoff). Aware that this observation is based on the experience of maintaining large deployments in custom ecosystems where a simple unified approach could be a relief, so it may not apply to everyone.

gnosek commented 1 year ago

Thanks for the kind words @incertum :)

Could anyone think of ways how we could bundle or master bpftool into Falco, or is this a bit too crazy for the time being?

bpftool is a cli wrapper around libbpf, which IIRC we already bundle in libs and the underlying machinery seems to be fairly simple anyway:

bpf_obj_get_info_by_fd(), a libbpf function, does effectively a single syscall: https://github.com/libbpf/libbpf/blob/68e6f83f223ebf3fbf0d94c0f4592e5e6773f0c1/src/bpf.c#L1030 (there might be some complications around getting the right fd, 🤷‍♂️ it's late here)
example usage: https://github.com/libbpf/bpftool/blob/566f82764fc403fb021bc511ca5ce56828bdda07/src/prog.c#L579-L587
you get back a nice struct: https://github.com/libbpf/bpftool/blob/8e721a476f58cb2ce16035cbaeca0352174aaa9e/include/uapi/linux/bpf.h#L6171-L6211

To get more insight from these stats we'd need to split the one huge eBPF tracepoint into per-event ones (or come up with some meta-instrumentation for the eBPF probe; the in-kernel stats are pretty basic anyway).

the major overhead appears to be sustaining different data pipelines or intermediary brokers to forward performance metrics to where you (person who deploys and maintains Falco in production) would like to have this data available

Sure, but this feels like dragging falco into the guerilla warfare between you and your SREs ;) If you already have an officially blessed prom exporter deployed by SREs, you can scrape it and correlate the data between it and falco, or maybe you can deploy a lightweight exporter to gather the generic stats yourself. I'm wary of implementing everything inside falco, since sooner or later it will start competing with systemd ;)

incertum commented 1 year ago

🤯 re the details around bpftool you gave -> ok I clearly have some more reading and catching up to do here. Thanks a bunch for the details!

You will probably laugh at hearing this, but I currently have a nice hack and "exfiltrate" those bpftool metrics every hour and by exfiltrating I mean I use some bash tricks to make the numbers appear in syscall related data fields I can export over Falco rules lol just so I have only one data pipeline to worry about.

Re the CPU and memory stats metrics 🙃 yeah it's not an easy overall story when looking at it from an ecosystem / diverse deployments point of view ... if it's ok to export 4 more raw numbers over the new stats event - at the same time why not? Let's think more about it and discuss further. If we need to make a tradeoff then the specialized metrics you can't easily get via alternatives should take precedence.

gnosek commented 1 year ago

I'm not laughing, you have my sympathy :) At the same time, I'm not sure cron and a system stats collector are core falco features ;)

As I have finally noticed in your initial comment, you want an event with these stats. In an ideal world, the non-falco stats could be provided by a plugin. Since every engine can (and will) have its own stats, we'd probably need arbitrary k/v event data (I'm mildly reluctant to just shove json in there) and this could be extended with plugins to measure anything you need.

So, thinking somewhat longer term (not sure what timeline you have in mind), we would:

introduce a concept of stats to falco and libs (it's already there in some vestigial form but would have to be cleaned up)
allow engines, plugins etc. to add their own stats to the core ones:
- the core stats would be some generic libscap stuff (events processed, dropped etc.)
- e.g. the bpf engine could add the per-tracepoint overhead numbers
- falco itself can add e.g. metrics about rules
- an external plugin (that could be shipped with falco or libs if the community wants) can provide all the extra stats you're looking for
depending on configuration, expose the stats as scap events, prometheus, grpc api or with carrier pigeons (in a very ideal world, this would also be pluggable but right now we don't allow multiple engines in the same scap handle so we can't inject extra events with a plugin: they'd be the only events)

The extra events should probably be injected in sinsp, not scap, since scap_next is fairly limited in scope while sinsp::next already handles everything, including kitchen sinks ;)

(having multiple engines in one handle is up there on the list I'd like to see in libs, along with LSM hooks)

incertum commented 1 year ago

;) I like those suggestions better than what I initially thought and re timeline - don't think an intermediary workaround is worth it just to have something faster, meaning I think let's do it rather the proper way and maybe aim for 0.35?

Does it sound like a good plan to focus on extending the "concept of stats to falco and libs" and more specifically the core stats and bpf per-tracepoint overhead numbers at first in the more near-term? And push the other generic stats, like CPU and memory as plugin option to the longer term?

CC @jasondellaluce? re the cleaner stats interface any more thoughts? As as @gnosek pointed out lots of ingredients like the event counter or the drop counter etc are kind of already there ... but not exposed for export at a regular interval. e.g. if you don't have any drops you don't know about n_evts at all. At minimum being able to reconstruct event rate regularly and correlate with CPU overhead would be a big win.

And agreed makes more sense to inject extra events to sinsp.

(having multiple engines in one handle is up there on the list I'd like to see in libs, along with LSM hooks)

Nice, big support for this. Is there an existing issue with an outline or could more details be shared elsewhere (would be interested)?

As chats about the LSM hooks are becoming more concrete now as well it would be nice to have concrete overhead numbers than needing to rely on reputation. I am one of the folks who wants all the nice and correct data/features, but I am also constantly fighting against overhead budgeting constraints.

gnosek commented 1 year ago

Does it sound like a good plan to focus on extending the "concept of stats to falco and libs" and more specifically the core stats and bpf per-tracepoint overhead numbers at first in the more near-term? And push the other generic stats, like CPU and memory as plugin option to the longer term?

Yes, IMHO. Since we're interested in engine (bpf et al.) stats, this has to live in libscap and then the upper layers (sinsp, falco) would build on top.

Nice, big support for this. Is there an existing issue with an outline or could more details be shared elsewhere (would be interested)?

Not that I'm aware of. For a fully generic solution there would be issues with e.g. the process table (each engine can supply its own and we'd have to 1. make sense of it and ideally 2. not duplicate work if e.g. both engines scan /proc). The easy way out is to run multiple (sinsp) inspectors in parallel but that doesn't help this particular use case. My never ending patch series slowly evolves to the point where scap wouldn't need to care about system state at all (just be an event pipe; all state would be managed by sinsp) so maybe it would be easier then.

incertum commented 1 year ago

ACK

Yes, it seems that as the project is evolving scap in deed shall best be reduced to a pipe. Would be very supportive of that, as it will make many future contributions that attempt to make the tool even more "intelligent" easier. Worth the refactoring trouble would say.

jasondellaluce commented 1 year ago

Coming late to the party, but I'm supportive of all the discussion above. Let's set a milestone for this not to lose track of the conversation, and eventually move this to the next closest release in which the first changes can fit.

/milestone 0.34.0

leogr commented 1 year ago

Any updates on this? :thinking:

incertum commented 1 year ago

Agree @leogr let's try to prioritize as it seems to have become more relevant in past few weeks. Have some cycles and can start today to get a PR open by beginning of next week and hopefully collectively we can make some good progress before the holiday break 🙃

jasondellaluce commented 1 year ago

/remove-milestone 0.34.0 /milestone 0.35.0

incertum commented 1 year ago

See https://github.com/falcosecurity/falco/pull/2333#issuecomment-1358785737, explore:

adding syscall counters to libsinsp
more verbose metrics logging if CPU exceeds a threshold.

Created a public HackMD document for additional discussions / clarifications around actual implementation details.

incertum commented 1 year ago

Additional comments around syscalls counters added in this PR https://github.com/falcosecurity/falco/pull/2361#issuecomment-1399402369

incertum commented 1 year ago

Updates:

libs

libs PR Number 3: The libbpf metrics PR (new metrics 3/n) still needs a bit of a refactor wrt the existing scap_stats to support integrating libbpf stats aka no new separate libbpf stats struct.
libs PR Number 4: @jasondellaluce will take on the (new metrics 4/n) PR for libs introducing the syscalls counters. So far we discussed to base them on the PPM_SC codes and also capture the syscall string name ... depending on the refactor in libs PR number 3 the syscalls counter may or may not be integrated into scap_stats as well, but it can be done in 2 separate PRs by @jasondellaluce . CC @Happy-Dude since you requested the detailed syscall counters.

falco

falco PR Number 1: light refactor of existing stats counters and define new schema, @incertum and @jasondellaluce.
falco PR Number 2: integrate new libs metrics, @incertum rebase existing WIP Falco PR.

Everything likely will get refactored a bit more under the hood after landing a v1 of these new metrics for Falco 0.35.

incertum commented 1 year ago

For the most part relevant libs changes are concluded:

Support for libbpf stats for modern_bpf will be wrapped up in upcoming libs 0.12.0 release, not 0.11.0 release.
TBD are the syscall counters @jasondellaluce, will we support them for Falco 0.35 or shortly after the next release?

Pushed to the existing Falco PR https://github.com/falcosecurity/falco/pull/2333 integrating the current new stats v2 metrics.

jasondellaluce commented 1 year ago

TBD are the syscall counters @jasondellaluce, will we support them for Falco 0.35 or shortly after the next release?

I'm having reduced bandwidth, so I'll try my best to fit in for 0.35, but shortly after the next release would be the next target in case I'll not make it in time.

incertum commented 1 year ago

Updated the initial comment https://github.com/falcosecurity/falco/issues/2222#issue-1387088554. Closing this issue as the task is completed.

Syscall counters and prometheus exporter option planned for Falco 0.36 will be tracked in new issues as part of the new roadmap planning. Thanks everyone for the valuable input and help ❤️ !!!

falcosecurity / falco

[New Feature]: Falco native resource utilization metrics logs support #2222