[Action] Decide framework for benchmark tests

nikimanoledaki commented 4 months ago

Motivation

As part of the CNCF Green Reviews WG's milestone for KubeCon+CloudNativeCon Europe '24, our main goal is to create the first benchmark test for Falco.

Feature

Proposal 1: Self-hosted runners with Actions Runner Controller (ARC)

Self-hosted GitHub Action runners could help us achieve this. Specifically, the Actions Runner Controller (ARC). We could add self-hosted runner in this repo (falcosecurity/cncf-green-review-testing) so that the Falco maintainers have ownership of the benchmark tests.

The benchmark tests can then be run in the cluster where Kepler and Prometheus are running and collecting energy metrics, along with other metrics for the SCI.

Stretch: The workflow could be triggered when there are new releases of Falco. GitHub Action workflows can be triggered by the build pipeline through a worfklow_dispatch.

+

Steps are reproducible and easy to trigger and run.
GitHub Actions + workflows are fairly common practice, well-known, well-documented, and reliable.
GA is free in public repos + no need for additional access
Workflow can be maintained from any repository, which makes it easy to collaborate with project maintainers.

-

Deployment and maintenance of a whole new namespace and tooling (ARC). This involves upgrades and other overhead.
Could max out capacity in our system Node so we may need a second one.
ARC may be new to many contributors. However, it is fairly easy to run and well-documented in the GitHub docs.

Alternatives

Proposal 2: bash script that runs as a Kubernetes CronJob

We could create and maintain bash scripts that run the steps. We could run these as Kubernetes Jobs.

+

Low maintenance in terms of tooling.
This would be a more “manual” process that we can start with and perhaps convert into GitHub Actions workflows next.

-

Difficult to reuse components and scale.
Kubernetes Jobs would need to be configured to run at a given interval (like on a cron schedule) rather than ad hoc, so we’d need to update the manifest in order to run them

Additional context

Suggested steps

Validate that Falco is deployed and running in the falco namespace on the isolated worker node.
Validate that the microservice workload is deployed in the falco namespace.
Benchmark test: for example, reach a given kernel event rate by sending requests to one of the microservice demo's endpoints for the given duration (e.g. 15 min).

Benchmark Test Acceptance Criteria

Steps must be reproducible
Duration must remain constant for each test scenario, for example, 15 minutes
Track and export the metrics that are the output of this test in a consistent manner

incertum commented 4 months ago

Excellent outline @nikimanoledaki - Favoring the self-hosted GitHub Action runners option. @maxgio92 is our infra management expert.

nikimanoledaki commented 4 months ago

Wonderful - we'll start on this immediately.

In the meantime, we will need your help with the following requirements:

[ ] Migrate any load test to a GitHub Actions Job
[ ] ~Share a repository-level GitHub PAT with the WG Leads privately so that we can authenticate ARC with this repo~
The following is the list of required personal access token scopes for ARC runners.
- Repository runners: repo

nikimanoledaki commented 4 months ago

Also - @incertum are you currently using the microservice demo that is currently deployed on the cluster for these stress test or planning to use it? Or can we remove it from the cluster for now? We can just comment it out so that Flux stops reconciling it. Please let me know :)

maxgio92 commented 4 months ago

Hi @nikimanoledaki, thank you for the detailed proposal. I also prefer the ARC way. I like also the idea of relating a green-review benchmark to a specific Falco release.

I'd propose to guarantee quality of service for the benchmark jobs and the ARC. For the benchmark I'd provision a dedicated node pool, if the cluster is shared with the energy monitoring services. For the ARC I don't think a node pool is needed - I guess the system pool for the energy monitoring services is ok - , but maybe just setting the guaranteed QoS at pod level.

WDYT?

nikimanoledaki commented 4 months ago

Hi @maxgio92! 👋

@rossf7 has been working on provisioning an isolated worker node for the falco namespace + components, which is nearly complete:

Tracking issue: https://github.com/cncf-tags/green-reviews-tooling/issues/30
falco namespace configuration via Flux: https://github.com/cncf-tags/green-reviews-tooling/tree/main/clusters/projects/falco

Please let us know if you have suggestions on any further isolation that could help with the benchmark tests :)

I'm not 100% sure if it would be best for the ARC runner Pod to run on the system node or the Falco-only node. I don't think it should run in the test environment - running everything ARC-related on one of the system nodes would be better. WDYT? 🤔

rossf7 commented 4 months ago

Hi @maxgio92, yes, as Niki says separating the components to run on separate nodes is nearly complete. We just need to add a node selector to our Flux pods.

For the benchmark I'd provision a dedicated node pool, if the cluster is shared with the energy monitoring services.

Yes, we will provision dedicated nodes for Falco using the labels defined in https://github.com/falcosecurity/cncf-green-review-testing/issues/2 this is done via our tofu automation.

I'm not 100% sure if it would be best for the ARC runner Pod to run on the system node or the Falco-only node. I don't think it should run in the test environment - running everything ARC-related on one of the system nodes would be better. WDYT?

@nikimanoledaki I think it would be better to run the ARC pods on our system node. To keep the nodes we're collecting measurements on as isolated as possible.

If we get short on resources we could move some of our internal components to the control plane node.

maxgio92 commented 4 months ago

Thanks @rossf7 and @nikimanoledaki! I agree on scheduling ARC on system nodes.

incertum commented 4 months ago

Also - @incertum are you currently using the microservice demo that is currently deployed on the cluster for these stress test or planning to use it? Or can we remove it from the cluster for now? We can just comment it out so that Flux stops reconciling it. Please let me know :)

We are not using it yet, but yes please keep it deployed. Much appreciated!

raymundovr commented 4 months ago

Hi @incertum 👋

We are not using it yet, but yes please keep it deployed. Much appreciated!

During our last discussion, we were not sure on the goal of this microservices deployment, it was also noticed that there's a stress test Deployment shipped [1] and [2]. To enrich our discussions, could you please explain a bit how these two components interact / play together with Falco and what are the plans for them? Thanks!

incertum commented 4 months ago

@raymundovr

During our last discussion, we were not sure on the goal of this microservices deployment, it was also noticed that there's a stress test Deployment shipped [1] and [2].

We previously discussed that for a v1 we will use the following synthetic workloads:

microservices deployment and synthetic traffic against it (read more below, this will be our best bet to generate more realistic activity)
redis +
stress-ng to add some static 24/7 baseline syscalls activity from our side, because Falco uses no CPU when nothing really runs on a server.

explain a bit how these two components interact / play together with Falco and what are the plans for them? Thanks!

Hi, we added a lot of new documentation to our website (https://falco.org/) explaining what Falco does and how it works if you are interested in more details. Falco is a Linux kernel security monitoring tool, passively hooking into syscall tracepoints. The more syscalls happen on a server the more work Falco has to do (simplified). Notably, Falco does not interact with synthetic workloads, rather, we use them to increase the frequency of syscalls, thereby making our testbed resemble real-life production environments where a diverse set of applications runs 24/7.

What additional questions do you have for us?

nikimanoledaki commented 4 months ago

A few questions provided by @roobre :)

Would we use stress-ng together with the microservice deployment (and/or redis as well)? Or are they all separate workloads?
It looks like stress-ng is performing some math - my understanding was that we'd need syscalls through I/O or networking. How does stress-ng create syscalls that are picked up by Falco? Is it to create some CPU load that may be unrelated to Falco?

raymundovr commented 4 months ago

Thank you @incertum for the clarifications. It is really helpful! @nikimanoledaki on the second point, I think it's what @incertum said:

[...]

stress-ng to add some static 24/7 baseline syscalls activity from our side, because Falco uses no CPU when nothing really runs on a server. [...] The more syscalls happen on a server the more work Falco has to do (simplified). Notably, Falco does not interact with synthetic workloads, rather, we use them to increase the frequency of syscalls, thereby making our testbed resemble real-life production environments where a diverse set of applications runs 24/7.

nikimanoledaki commented 4 months ago

Thanks @raymundovr & @incertum. Rewording my questions for clarity:

Does any type of syscall trigger Falco, and does the type of stressor matter?

For example, we discussed specific syscalls from I/O or networking in the past. However, we're doing stress-ng --matrix 1:

--matrix N start N workers that perform various matrix operations on floating point values. Testing on 64 bit x86 hardware shows that this provides a good mix of memory, cache and floating point operations and is an excellent way to make a CPU run hot. By default, this will exercise all the matrix stress methods one by one. One can specify a specific matrix stress method with the --matrix-method option.

This uses stress-ng to stress the CPU through mathematical operations, as opposed to I/O read/writes or networking-related syscalls.

A different way to do this would be with --class, where we can track the class of stressor:

specify the class of stressors to run. Stressors are classified into one or more of the following classes: cpu, cpu-cache, device, io, interrupt, filesystem, memory, network, os, pipe, scheduler and VM.

I'm trying to understand if we want to log the type of stressor as a variable. Does it matter? Or does it not matter as long as the target kernel event rate is reached?

Would stress-ng, the microservice demo, and redis be used as separate workloads or together?

This is just for me to understand how we're setting up the benchmark tests but I fully trust @incertum and team with owning the test scenarios etc. Thank you! :)

AntonioDiTuri commented 4 months ago

Hi I am trying to sum up here a very interesting discussion we had around the proposal for the benchmark test in the public slack channel of the worker group. Thanks @leonardpahlke for suggesting public runners, @nikimanoledaki for stearing the discussion and all the others partecipating @rossf7 @ dipankardas011.

This is the 3rd proposal: Modular GitHub Action workflow (public runners)

Here you can find an overview drawn by @leonardpahlke:

2024-02-14 TAG ENV Wg Green Reviews Structure Draft

Workflow:

The release of a new version in a CNCF project repository triggers a webhook event.
This event prompts the GitHub Actions in the green-reviews repository to initiate the benchmarking pipeline.
The pipeline retrieves the specific usage scenario configuration from the /usage-scenarios/ directory, which details how to deploy and benchmark the new release in various load conditions.
Equinix resources are spun up as needed, deploying the benchmarking nodes and setting up the observability stack.
The results of the benchmarking are then collected and displayed through Grafana, allowing for analysis of the environmental impact.

News:

Usage of public GitHub Actions runners to balance flexibility, security, and ease of maintenance.

Having multiple pipelines is more complex, we need to rely on others more (which is a big deal if we plan to support more projects in the future, less scalable, more operations). Security wise it’s not good either (we go away from single point of auth -> transitive dependency).

External repos will trigger our github actions via webhook

The location and version of a reusable workflow file to run as a job. Use one of the following syntaxes: {owner}/{repo}/.github/workflows/{filename}@{ref} for reusable workflows in public and private repositories. ./.github/workflows/{filename} for reusable workflows in the same repository.

This approach emphasizes sustainability, collaboration, and operational simplicity, which are crucial for the ongoing success and scalability of the green-reviews-tooling initiative.

incertum commented 4 months ago

@nikimanoledaki I have responded here https://github.com/falcosecurity/cncf-green-review-testing/discussions/13#discussioncomment-8551883 to your feedback re the synthetic workloads composition, thank you!

incertum commented 4 months ago

@AntonioDiTuri 🚀 thank you very much for taking the time and posting an update here https://github.com/falcosecurity/cncf-green-review-testing/issues/16#issuecomment-1957398220. Amazing, we are looking forward to receiving more clear templates or instructions. As a heads-up we need to be mindful of @maxgio92 availability as well not just mine since Max is our infra expert and we will need his help 🙃.

Some initial feedback:

For Falco instead of idle I would propose to use the term default baseline workloads as for Falco an idle condition is less insightful -- I forgot where I posted that previously.
In response to @nikimanoledaki's question regarding how we want to mix and match stress-ng and other workloads (such as the mechanisms to create fake requests against the demo microservices), I would propose keeping the composition of synthetic workloads separate from ensuring the functionality of the entire pipeline and reporting for the three benchmark runs. If you asked me, I would prioritize getting the plumbing to work first, even if it means initially using the same default / baseline workloads for Falco three times. Once that's accomplished, the Falco team can focus on optimizing the workloads, which would be a longer-term effort spanning 1+ year or more given we are currently low on resources / people time in general -> help-wanted if you know of anyone who might be interested.

poiana commented 1 month ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 2 weeks ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

incertum commented 2 weeks ago

/remove-lifecycle rotten /remove-lifecycle stale

falcosecurity / cncf-green-review-testing