Closed plombardi89 closed 5 years ago
OK, here we go!
Like most real-world software, Ambassador comprises multiple functional units that work together to get a job done, and we need to test all of them to have confidence that development is improving the product, rather than breaking it. Of course, not all the units can be tested the same way, which makes covering them entertainingly tricky.
At a very high level, Ambassador's functional units are:
ambassador/ambassador/config.py
.ambassador/kubewatch.py
, except for some bits that live in ambassador_diag/ambassador_diag/diagd.py
.ambassador/ambassador_diag/diagd.py
) and manages restarting Envoy on config changes (ambassador/hot-restarter.py
, ambassador/start-envoy.sh
, and ambassador/kubewatch.py
).ambassador/ambassador_diag/diagd.py
, but it relies heavily on ambassador/ambassador/config.py
).Ambassador currently has two different kinds of tests:
Tests in ambassador/tests
test the translation engine, which is pretty straightforward:
These tests use pytest
; the drivers are in ambassador/tests/ambassador_test.py
and ambassador/tests/corner_case_test.py
.
Tests in end-to-end
currently try to test the Kubernetes interface and the Envoy interface. More about these in a moment.
The diagnostic service is partly covered by the end-to-end
tests, but its UI is basically untested right now. This is a problem, but not our immediate concern.
Right now we actually spin up a cluster and pump traffic through the Envoy that Ambassador has configured, but this conflates multiple kinds of tests:
We are actually testing Envoy's functionality.
This is actually kind of silly. We need to test our understanding of how to configure Envoy for a given task while we bring up new features, but once that's done, we probably can rely on Lyft for confidence that Envoy works.
We are testing Ambassador Kubernetes interface.
This isn't silly at all. It's not that hard to break kubewatch
, for example, and we need to know if Ambassador suddenly starts taking twice as long to notice a change.
We are testing Ambassador's Envoy interface.
This isn't exactly silly (we do need to know that the restarter will restart Envoy when appropriate) but this isn't code that changes often, either: it's probably overkill to test it on every commit.
We are testing Ambassador's runtime stability (e.g. is it filling up the disk? leaking memory? spamming the logs?).
This also isn't silly, but it's probably not necessary on every commit, either. There's a difference between a soak test and a functional test.
An additional problem with the end-to-end tests is that they're written in this awful mix of shell and Python, using a bunch of shell utilities that are, shall we say, perhaps not the most graceful bits of code around. They end up duplicating a certain amount of Forge's functionality, so really we should just use Forge for those bits.
It's worth calling out a few things that are important now, or will become important in the near future:
Spinning up a cluster is really slow.
Whatever we do with Kubernetes, we need to be able to do most or all of it without deleting and recreating whole clusters.
ADS is coming.
Whatever we do with Kubernetes, it needs to be amenable to a world where we're just feeding Kube events into something very different from what we have now. The initial ADS cut will almost certainly be a process that accepts a stream of incoming Ambassador configuration elements and performs ADS calls into a running envoy.
Hey wait a minute, we don't have to wait for ADS to shift to that model.
If you look at kubewatch.py
and squint, you see that the bits that accept Kube events and translate them into Ambassador configuration elements are already orthogonal to the bits that manage generating the Envoy config and performing the hot restart. Formalizing that separation is a small step.
Separate kubewatch
explicitly into two pieces (I'm thinking of these as processes right now, but they could as easily be threads in a single process):
configmgr
process will:
kubewatch
will change to:
configmgr
Once that's done, we can test kubewatch
pretty easily:
kubewatch
kubewatch
saw them all and gave them to configmgr
correctly(This is probably most simply done by starting real Ambassador, but writing some sort of dump
request into configmgr
.)
We can also test configmgr
pretty easily, without Kubernetes:
configmgr
standalonekubewatch
configmgr
generates the IR that we expectAdding new kubewatch
tests will probably be a touch annoying, still, but it's not clear how often we'll need to do that: there isn't a combinatorial effect here, since kubewatch
has a really simple job.
Adding new configmgr
tests will require generating a new event stream, which could kind of suck unless we build some reasonable UX into Ambassador to grab real-world stuff. More discussion here is likely going to be relevant. The actual mechanics of adding the test, though, will be a matter of "dump a couple of text files into a new directory", and that will hopefully simplify things.
@plombardi89 @rhs We should look over this next week and make sure that we all agree that the CoS have been met.
Phil, Flynn, and I had a meeting to discuss this issue and came up with the following items and areas to explore:
We agreed that the above items represent a good summary/breakdown of what is discussed in this issue. The plan is to create separate issues for each of these items, better define them in their own issues, and then close off this issue when the new issues exist.
@plombardi89, @kflynn please shout if this summary is inaccurate or incomplete.
Looks accurate to me. Attaching Epic label.
Thinking a bit about how we actually get this done, note that we have four major things we want to do that are all interrelated:
There are some more minor things that play into this, too:
I'm not going to recap the reasons why we want all these things, but the dependencies between bits are very relevant:
kubewatch
as described my earlier comment.So my suggestion for implementation:
Tactical E2E stuff. This is already mostly done: it splits E2E into a serial section and a parallel section, and runs the tests in the parallel section in parallel. This should actually help quite a bit.
Split kubewatch
and configmgr
as described in my earlier comment:
kubewatch
configmgr
After the kubewatch
/configmgr
split, there are a few different things we can do:
Split E2E and Envoy tests:
More TLS-secret work:
kubewatch
, and let configmgr
look only at snapshots that just include certs as synthesized resources
kubewatch
to watch for updates, and update snapshots with them
V2:
config.py
how to translate snapshots into either V1 or V2
config.py
config_dump
output from Envoyconfig_dump
only covers part of the config, so this isn't good enough by itself, but it's a valuable assistNote that we could do V2 support without the E2E work, but having the E2E work done will make it quite a bit easier to test the V2 work, since we'll be able to focus on individual areas more effectively.
IR happens... somewhere? maybe in my copious spare time?
The debugability of the E2E suite is brutal. I have no idea why at times something failed or what it failed for. The mixture of error logging and debug logging that doesn't indicate which is which doesn't help.
For example is this an error, warning or debug? Some pods have yet to start?
@plombardi89 Without disagreeing with you, the exit code is the gold standard. Some pods have yet to start?
should be followed immediately by exiting with status 1, as it is indeed an error. Is that not what happens?
try 21: 1 not running
try 20: 1 not running
try 19: 1 not running
try 18: 1 not running
try 17: 1 not running
try 16: 1 not running
try 15: 1 not running
try 14: 1 not running
try 13: 1 not running
try 12: 1 not running
try 11: 1 not running
try 10: 1 not running
try 09: 1 not running
try 08: 1 not running
try 07: 1 not running
try 06: 1 not running
try 05: 1 not running
try 04: 1 not running
try 03: 1 not running
try 02: 1 not running
try 01: 1 not running
Some pods have yet to start?
================ end captured output
================================================================
1: 005-single-namespace...
and it keeps going and going...
Notes from the meeting we're in the middle of:
Part of the development process for Ambassador is, of course, verifying that the Envoy configurations produced by Ambassador for a new feature or bugfix actually cause Envoy to behave as intended. An important part of the the Brave New World described here is that the actual Ambassador configurations (as they appear in Kubernetes resources) used for this functional testing become unit tests, in every case, so that revalidating the behavior when we change versions of Envoy is automated.
Note also that the split between kubewatch
and configmgr
is meant to speed up tests (by not requiring us to run everything in Kubernetes all the time) and therefore to permit broader coverage (since we can run more input permutations in the same amount of time).
We're planning to do V2 before ADS, because
configmgr
can generate V2.
Running kubewatch
and configmgr
as two processes is probably less robust than splitting them into classes and having a single process wrangle things.
Flynn to generate interface specs
Shubham to generate example DevEx for working through dev process and having it magically be a regression test
I am putting some thoughts around DevEx at https://docs.google.com/document/d/1tynG8yldeIUdFXP80_1Jd1HHh39El2mwyLNvGVDQLDw/edit
Feel free to review and comment
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
The current E2E test harness has some issues we would like to address:
With the E2E suite we want to design and implement something that will:
Design CoS
Implementation CoS
Additional Nice To Haves