Unified test framework

ahrtr commented 1 year ago

What would you like to be added?

We reinvented some wheels on test frameworks. Currently there are functional test, e2e test and linearizability test (based on e2e). The Jepsen test is also an option, but it's written in Clojure, and out of the existing maintainers' control, so it isn't in the scope for now.

Each test framework has pros and cons.

Functional test

Pros

The functional test has a well-designed framework, and it's flexible & extensible. Currently it only checks hash (and also has lease related check), but we can easily add more checker.
It supports flexible failure injections via gofail. It lists all the available failpoints automatically, and users just need to configure the commands/terms in failpoint-commands, then it automatically creates all the possible combinations. For example, assuming there are 100 failpoints, and we define 3 commands/terms, then there will be 100*3 cases of failure injection.
It also supports other kinds of failure injection, such as delay, network blackhole etc.

Cons

It reinvented an agent to start/stop etcd clusters. Ideally it should use the same utilities as the e2e test.
It's out of maintenance, and accordingly flaky. This shouldn't be a big deal, as long as we decided to spend more time on it.

Linearizability test

Pros

Linearizability is really important for distributed system. It's good to have a test framework to focus on verifying the Linearizability of etcd.

Crons

It's based on a personal repo porcupine, which isn't actively maintained. When a test case reports failure, I don't think we even have enough confidence that the failure isn't caused by the test framework/utility.
It isn't well-designed, and of course isn't flexible & easily extensible. All the concepts (e.g. traffic player, failure injection, result checker etc.) are basically tightly coupled.
It's doing things that are beyond what the concept of Linearizability can hold. Conceptually, Linearizability test is just part of the functional test, but it's trying to do the same thing as the functional test.

So we really need to create a unified test framework, otherwise future maintainers may reinvent a new test framework. It also isn't good to add too many code to the existing Linearizability test without having a well-designed framework beforehand.

The high level diagram is as below. Actually the high level structure is basically coming from the existing functional test. The components highlighted with orange are new to functional test. We still need more breakdown on the high level design, but please always keep OCP (Open-Close principle) and DIP (Dependency inversion principle) in mind.

OCP (Open-Close principle): Close for modification, and open for extension.
DIP (Dependency inversion principle): high level framework shouldn't depend on low-level module, both of them should depend on abstraction.

unified_test_framework

Why is this needed?

See above.

cc @mitake @ptabor @serathius @spzala

ahrtr commented 1 year ago

I would expect more contributors can get involved in the design, so as to have deeper understanding on the unified test framework. I am also thinking probably we can have more maintainers to dedicated to the maintenance of the test framework, just like the maintainer (e.g. @tbg ) dedicated to raft.

serathius commented 1 year ago

Maybe I'm little biased (autored one of the framework), but I totally don't understand your point. I have totally different perception on each of the arguments you proposed that I think we need a larger alignment before moving forward.

On functional framwork

I don't agree that it is well design. It's overdesigned as it splits the test runner into multiple binaries and designing an internal rpc just to run tests that can be run locally.

It's not flexible neither extensible as adding a simple test scenario requires code changes through tens of lines and proto. For example for first data inconsistency I just wanted to add a test that does SIGKILL instead of SIGTERM https://github.com/etcd-io/etcd/pull/13924.

Supporting gofail is not a pros. It's just a feature that is already implemented by linearizabilityt tests. Fact that last data durability issue and inconsistency with panic were not discovered by funtional tests is just proof that this doesn't work at all. Both of those issues are easily discoverable via gofailure injection and were discovered/confirmed via linearizability tests.

Supporting network black holing is not a pros, it's a feature.

On linearizability tests

Whole idea about confidence on test failure is the main advantage of linearizability tests over functional tests. Functional tests have very high flakiness vs Linearizability tests have never flaked.

It's hard to me comment on whether linearizabilityt tests are well design, as I'm biased. I prioritized simplicity over everything else. OCP and DIP rules might be nice but I prefer KISS, keep it simple stupid. I though you agreed with that as you personally reviewed the code https://github.com/etcd-io/etcd/pull/14398.

As for concept of linearizability, it's just a name not con.

Overall I agree that we can't develop two frameworks, however I don't agree with proposed direction as I don't see superiority of current functional framework. My reasons to develop new test framework:

functional test are incompatible with existing e2e framework so investing into them would go against https://github.com/etcd-io/etcd/issues/13637
even with more features functional tests had still a negative impact on CI. They only give false positives breaking CI without finding any issues.
functional tests are immensely hard to extend to support even a simplest case needed to test linearizability.

This issue is in hard opposition to https://github.com/etcd-io/etcd/issues/14045

serathius commented 1 year ago

Meet with @ahrtr and we agreed on couple of things:

Having two frameworks doesn't make sense. We need to merge them at some point.
Naming is hard. Both functional and linearizability tests serve the same purpose, they should wall under same name.
Functional tests have their own cluster management framework which is bad. E2e test framework should be used.
Functional test are needlessly complicated, for example test agent separation is not needed.
Functional tests configuration via yaml is bad.

Detail about merger are not set in stone, but I my personal opinion is that best way to proceed is to fill feature gaps in linearizability tests and replace funtional tests with it. After that we can rename linearizability tests to functional.

serathius commented 1 year ago

Superseded by https://github.com/etcd-io/etcd/issues/14820

etcd-io / etcd