falcosecurity / cncf-green-review-testing

Falco configurations intended for testing with the CNCF Green Reviews Working Group
Apache License 2.0
1 stars 2 forks source link

[Tracking] Create Optimal Synthetic Workloads / "Kernel Event Rates" for Falco Testing #11

Open incertum opened 5 months ago

incertum commented 5 months ago

Creating a formal issue on our side to formally track the progress we are making wrt to tuning the synthetic workloads in relation to the CNCF testbed constraints, so that the reported Falco Metrics are as realistic as possible.

@AntonioDiTuri @nikimanoledaki @rossf7

incertum commented 5 months ago

@AntonioDiTuri re https://github.com/cncf-tags/green-reviews-tooling/issues/43#issuecomment-1930332605:

Correct it's all setup as outlined in https://falco.org/docs/metrics/falco-metrics/ Our falco.yaml is a configMap, see here.

I am looking at the following metrics:

falco.cpu_usage_perc
falco.cpu_usage_perc_total_host
falco.host_num_cpus
scap.evts_rate_sec

On that note: I anticipated much larger machines (e.g. 64, 96 CPUs, not only 16 CPUs) -> hence I believe I am currently stressing the machine out way too much (!!!!!) - we are at a less realistic / very high load for an average server load use case. Right now we are at 102K events / second when normalized for one CPU. We should be targeting something like 1-3K per one CPU, for example read this new guide: https://falco.org/docs/troubleshooting/dropping/

Allow some time for me to experiment now that I have access to the falco pods. meanwhile, happy to answer more follow up questions.

AntonioDiTuri commented 5 months ago

Thanks Melissa for investigating.

Are those metrics that are saved in the /tmp/stats folder also accessible via REST API?

We are interested in building a Grafana dashboard that could graph those metrics

One more question: can we use the Falco Prometheus exporter to have those metrics available for the Grafana Dashboard?

incertum commented 5 months ago

Are those metrics that are saved in the /tmp/stats folder also accessible via REST API?

No, kubectl cp could be an interim solution. However, since we are targeting Prometheus support by Falco 0.38.0 (the next release) I would propose to skip workarounds (https://github.com/falcosecurity/cncf-green-review-testing/issues/12). Meanwhile we can ad-hoc inspect these JSONL metrics outputs we have today. In a way, these metrics are helper metrics (nice-to-haves) in order to configure Falco optimally for best benchmarking results.

We are interested in building a Grafana dashboard that could graph those metrics

Absolutely, let's discuss then.

One more question: can we use the Falco Prometheus exporter to have those metrics available for the Grafana Dashboard?

This comment answers your question https://github.com/falcosecurity/cncf-green-review-testing/issues/12#issue-2122460300. No we can not, plus the Falco maintainers have decided to deprecate https://github.com/falcosecurity/falco-exporter all together.

nikimanoledaki commented 5 months ago

plus the Falco maintainers have decided to deprecate https://github.com/falcosecurity/falco-exporter all together.

This is good context to know and answers our questions around that repo - thank you!

AntonioDiTuri commented 5 months ago

@incertum when do you plan to release Falco 0.38.0? Is there a file I can check in the falco for planned releases and related updates?

Also do you already see a solution for graphing the row data in /tmp/stats in Grafana from the raw file?

incertum commented 5 months ago

@AntonioDiTuri absolutely, we link to our roadmap here and our Falco 0.38.0 milestone is currently set to May 27, 2024, see here.

Please follow this issue https://github.com/falcosecurity/cncf-green-review-testing/issues/12 to track our Prometheus integration progress. Thanks for your patience.

poiana commented 2 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

incertum commented 2 months ago

/remove-lifecycle stale