elastic / cloudbeat

Analyzing Cloud Security Posture
Other
13 stars 42 forks source link

[Discussion] Component testing #613

Closed olegsu closed 1 year ago

olegsu commented 1 year ago

Component Test

Motivation

Tests are important, they ensure that different components of a system function correctly when working together. Without component tests, it is difficult to identify issues that arise from the interaction of different components, which can lead to problems in the overall functionality of the system.

There are two main types of tests that we use: unit tests and end-to-end tests. Unit tests are used to test small and specific sub-flows of Cloudbeat. These tests are quick and easy to run locally, and they are written as part of the development process. They can have some limitations. For example, they often rely on mocked data (such as simulated disk IO, network, or SDK calls) which can reduce the level of confidence we have in the test results. Also, unit tests only test small sub-flows of code, so they may not catch issues that arise when different components are combined.

On the other hand, our “end-to-end” tests (excluding Kibana) involve testing Cloudbeat by interacting with live APIs and reporting to live Elasticsearch instances. These tests are written in Python and collect data from ES to verify the results. End-to-end tests are more reliable because they test the system in a more realistic environment. However, there are also some drawbacks to using end-to-end tests. For example, they are written in python, which can make it harder for people to contribute to them. They are slower to run, executed in a blackbox, and more prone to flakiness, which can make debugging failed tests more challenging. Finally, running end-to-end tests locally is difficult.

Proposal

I would suggest adding another level of testing (cloudbeat component tests) that focuses on specific integrations (AWS, Kubernetes) of our code. These tests would use mocked clients (with real data), which would give us a better understanding of how our code would perform later.

One of the main benefits of this approach is the quick feedback loop. By writing these tests as part of our codebase and using the go test command, we can easily run them locally and get immediate results. This will make it easier to identify any issues or bugs that may arise.

Another thing is the ability to debug more easily. Since these tests are part of the same stack, it will be easier to track down any problems and fix them. And while there is still the possibility of flakiness, it should be easier to identify and address locally.

These tests should be designed to mimic real-world scenarios as closely as possible, including factors such as memory consumption and CPU usage. This will give us a more accurate understanding of how our code will perform in a production environment.

Gaps

  1. libbeat dependency
  2. Globals, init functions, and external dependencies

Libbeat

From what I know, libbeat does not provides any utilities related to testing.

What we are looking for is a way to easily run the beat with a specific configuration, mock the connection to the fleet server, and control the lifecycle behavior of the beat (meaning, start it, stop it, and reconfigure it).

I would suggest having this addressed in the future as it requires more work from multiple teams.

Meanwhile, we need to find workarounds.

Globals, init functions, and dependencies

clear is better the clever

The key to writing maintainable programs is that they should be loosely coupled—a change to one package should have a low probability of affecting another package that does not directly depend on the first.

There are two excellent ways to achieve loose coupling in Go

  1. Use interfaces to describe the behavior your functions or methods require.
  2. Avoid the use of a global state.

source: Practical Go

We don't have a lot of globals, one that makes it hard to run tests is Factories (resources/fetchersManager/factory.go).

tl;dr: magic is bad; global state is magic → no package level vars; no func init

source: A theory of modern Go

Our factories use the init function to register themselves with the Factories, which is a list of all the available fetcher factories. This makes it hard to write tests because we need to figure out a way to mock the Factories so that we can test each fetcher individually. It would be easier to write tests if we could somehow isolate the fetchers from the Factories during testing.

The init Function: Avoid if Possible When you read Go code, it is usually clear which methods and functions are invoked and when they are called …. Go’s compatibility guarantee for its standard library means that we are stuck using it to register database drivers and image formats, but if you have a registry pattern in your own code, register your plug-ins explicitly.

Source “Learning Go An Idiomatic Approach to Real-World Go Programming”

Besides that, the init functions construct the factories with proper API clients such as AWS and Kubernetes. Since those clients initiated deeply in the code it is hard to mock them for complete flow testing.

To create a fetcher object or any other object in Go for testing purposes, we have a few options available to us. One option is to declare a new constructor for each configuration option. Another option is to use a custom Config struct. The third option is to use the Functional Options Pattern.

Out of these options, I recommend using the Functional Options Pattern because it allows us to have more control over the code, makes the code more explicit and easier to review, and makes it easier to inject dependencies for testing. Additionally, it includes built-in default behavior. You can find more information about these options here.

For example, this is how the ElbFetcher will look like

type ElbFetcher struct {
    log             *logp.Logger
    cfg             ElbFetcherConfig
    elbProvider     awslib.ElbLoadBalancerDescriber
    kubeClient      k8s.Interface
    lbRegexMatchers []*regexp.Regexp
    resourceCh      chan fetching.ResourceInfo
    cloudIdentity   *awslib.Identity
}
type Option func(*ElbFetcher)
func New(options ...Option) ElbFetcher {
        fetcher := &ElbFetcher{
            // fill all the defaults
            // logger
            // channel
      // providers
        }
        for _, option := range options {
            option(&fetcher)
        }

        return ElbFetcher
}

// Wtih_Property will override the default
func WithElbProvider(provider awslib.ElbLoadBalancerDescriber) Option {
    return function(e *ElbFetcher) {
        e.elbProvider = provider
    }
}

Final results

This is just one example of how a test might be structured. Other tests might need different implementation, requirements, or testing tools.

// +build integration
// -- 
// build tags are used to separate integration and unit test to keep the unit super light
// to run the integration test use "go test --tags=integration"

func TestCloudbeat(t *testing.T) {
    flag.Set("path.config", "/path/to/testing/config")
    flag.Set("c", "cloudbeat.yml")
    settings := instance.Settings{/*...*/}

    // NewSimpleClientset returns a clientset that will respond with the provided objects.
    // It's backed by a very simple object tracker that processes creates, updates and deletions as-is,
    // without applying any validations and/or defaults. It shouldn't be considered a replacement
    // for a real clientset and is mostly useful in simple unit tests.
    k8s := k8sFake.NewSimpleClientset()

    // This function does not exist, moreover the instance. Run will create the connection for us
    // and we don't have a way to override it easily.
    // This should be initializing with all the expected findings that cloudbeat will be sending
    // with an expectation that at the end of the test the queue is empty.
    // Meaning no expected findings left and no unexpected findings sent
    es := elastic.NewFakeClient()

    // Signal will be used to gracefully stop the beat - this is a workaround
    // as we don't control the beat lifecycle
    signal := make(chan struct {})

    // NewCloudbeat will get all the mocked dependencies
    // and use them instead of default and real connections
    beaterCreator := NewCloudbeat(
        WithKubernetesClient(k8s),
        WithElasticClient(es),
        WithStopHandler(signal),
    )
    err := instance.Run(settings, beaterCreator)
    assert.NoError(t, err)

    // Wait for some time and send a signal to stop
}

References

eyalkraft commented 1 year ago

@olegsu Thanks for this detailed proposal!! I really like this proposal and the overall direction.

I think we already have a few tests using a mocked k8s client, so I'm not completely sure how the tests you're proposing differ in the aspect of what will they test and how would that be different from what we have.

Regarding all the testability improvements you proposed, they sound like a great step towards higher quality codebase.

olegsu commented 1 year ago

Thank you @eyalkraft. You are right, we are using the Kubernetes mocks, also have AWS mocks, but they are all small unit tests. The suggestion is to have the ability to test complete flow, just with mocked data. For example, the test structure I suggested will start Cloudbeat and run a complete flow, including all the logic we have with OPA and other internal components that might change the data somewhere. In the end, we run the assertion against what is expected to be reported to Elasticsearch.