Refactor the sidecar integration tests to support different types of connections

rhauch commented 1 week ago

Summary of Changes

Refactor the integration tests to more easily support different types of connections and sets of test environments composed of different test containers. And more efficiently create, start and manage those across our tests to minimize the build impact by avoiding unnecessary and time consuming steps.

This does not add or remove tests, and only minimally changes existing integration test classes.

Details

Previously, we had a ConfluentLocalTestBed class that managed containers for Kafka and SR, and had utility functions (and state) for working with those containers and the sidecar process being tested.

This design had a few issues. First, it mixed together multiple things. Second, it required classes extend the ConfluentLocalTestBed class that would have made it difficult to use a different environment, say for integration testing. Third, it made it difficult to run a suite of tests on local connections AND on direct connections. Relying primarily upon inheritance is restrictive in Java due to the single base class approach.

This PR refactors the ConfluentLocalTestBed into three distinct components:

LocalTestEnvironment extends a newTestEnvironment interface and is used to manage testcontainers for a Kafka cluster and Schema Registry cluster.
SidecarClient that provides useful functions for making calls to a sidecar process that is being tested. These are mostly just wrappers around RESTEasy calls, but there is some state about current connection ID, Kafka cluster ID, and Schema Registry ID to simplify method signatures and avoid having tests keep track of those (since most tests will use one connection, one Kafka cluster, and one SR.
AbstractSidecarIT serves as a base class for integration tests that test a sidecar and that use the Kafka and Schema Registry containers made available through LocalTestEnvironment that is started once when first needed by an integration test and shutdown after all integration tests have completed. All integration tests (regardless of the number of AbstractSidecarIT subclasses that organize/structure the integration test methods) now share the same containers, which significantly reducing the build time compared to starting and stopping the containers (~7-10 seconds) for each integration test subclass (we currently have 7). The AbstractSidecarIT also has methods that subclasses can use to define when and how many connections are created and how they are reused. For example, currently our AbstractSidecarIT subclasses all create a single local connection in a @BeforeAll method and reuse it throughout all of their test methods. This sharing noticeably reduces the test times compared to creating and tearing down connections in all of our dozens of integration test methods: creating one local connection and getting GraphQL resources (to get Kafka and SR connection details) would have added takes 3-5 seconds to every test method.

Why does this matter?

Most of our integration tests use containers and local connections. In upcoming PRs, we'll double the number of those connections by subclassing the existing AbstractSidecarIT concrete classes to run those same tests against direct connections. And we plan to add more integration tests for creating and managing connections.
We will likely need to test against other containers very soon. We will be able to add other TestEnvironment implementations in the future, such as one for CP containers, maybe one for a WarpStream container and SR container, etc. With the changes in this PR, we can easily implement them, keep the logic separated from the LocalTestEnvironment (which uses Confluent Local and SR), and start them once in AbstractSidecarIT and share them in any/all integration tests that want to use them.

Pull request checklist

This PR does not add any new tests, and instead

Please check if your PR fulfills the following (if applicable):

Tests:
- [ ] Added new
- [x] Updated existing
- [ ] Deleted existing
[x] Have you validated this change locally against a running instance of the Quarkus dev server?
```
make quarkus-dev
```
[x] Have you validated this change against a locally running native executable?
```
make mvn-package-native && ./target/ide-sidecar-*-runner
```

rhauch commented 1 week ago

The build times on this are way up. I think more of the integration tests are running during unit testing w/ surefire rather than only during integration test phase w/ failsafe. I'm still looking into this.

rhauch commented 1 week ago

The changes introduced in this PR significantly added to the build time -- it was taking 30 minutes to do a regular build. It turns out that I was not reusing containers as much as I had planned, and the extra work to set up and tear down the connections (and issuing multiple GraphQL queries for each) after every test was significantly time consuming.

Prior to this change, every AbstractSidecarIT would start its own containers and shut them down after the subclass' tests were run. This added about 5-10 seconds per subclass, and since we currently have 7 subclasses, this is over minute just to restart and shutdown the extra containers. And will likely have more subclasses in the future.

Also, all of the tests in those IT test classes ended up creating a new local connection, which could add 3-5 seconds to each test. Part of this was due to the way in which I was using GraphQL queries during startup to find the clusters and during shutdown to find the clusters that were used. And soon we’ll double the tests by running those tests against direct connections. This was totally unacceptable.

This latest commit improves the AbstractSidecarIT base class to efficiently reuse containers and connections across many tests. First, the AbstractSidecarIT base class starts up the LocalTestEnvironment containers before the first integration test is run, and then shut them down after all integration tests have completed. This is true no matter how many subclasses of AbstractSidecarIT there are. Oh, and all of the subjects and topics created in one tests are cleaned up between each test.

Second, this PR introduces the concept of a “test scope” to the creation of connections, so that connections can be created once and used for all of the integration tests that use the same scope. All of the AbstractSidecarIT subclasses use a test scope named for the concrete IT class, though the technique is flexible to allow other patterns. This change can save up to five minutes across a hundred tests.

Note that this allows us to add other TestEnvironment instances (e.g., for CP or CCloud) to AbstractSidecarIT in the future, if we want to create them. Any containers used in those TestEnvironment instances would also be started once and reused for all integration tests.

rhauch commented 1 week ago

Thanks for the review, @flippingbits. I've incorporated most of your changes, and created a few issues for the others.

confluentinc / ide-sidecar