envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.26k stars 4.69k forks source link

xDS conformance test suite #11299

Open htuch opened 4 years ago

htuch commented 4 years ago

It would be helpful to control plane implementors and client implementations to have an xDS conformance test suite that is independent of Envoy. I think this would largely look like some scripts (Python or Go) that create xDS gRPC connections and exercise some xDS exchanges.

A priority area here is delta xDS, which control plane implementors consider to be challenging to verify.

I think we could separate this work into two parts; building out the basic test infra and then building out an increasingly more complete test suite.

@snowp @howardjohn @derekargueta

htuch commented 4 years ago

@caniszczyk @mattklein123 I remember a while back we were discussing possibilities here; I think there was someone at Lyft interested in this and also CNCF funding. In any case, would be rad to learn what is possible here.

derekargueta commented 4 years ago

Thinking about this abstractly, I think starting with client conformance might be much easier even though it might have less value since afaik Envoy is the only conforming client (could be nice as out-of-tree integration tests I suppose).

Control plane conformance might be a bit difficult since there isn't a universal way to set the backing resources such as endpoints, listeners, routes, etc. Many control planes use Kubernetes ConfigMaps, some use databases like PostgreSQL, etc. So for the control plane being tested, the test suite would have to know how to set up the backing store such that the control plane will return certain resources, otherwise it's non-deterministic, especially true of delta xDS where to verify correctness it's useful to validate which resources aren't being sent. This becomes even more challenging for everyone building a private internal control plane that has its own idioms.

As a more concrete example, we could write a Go script that sends a DiscoveryRequest to the control plane with an empty resource_names to indicate return all resources, but without a way to set mock return values then if the control plane returns an empty list it's difficult to discern whether there really are no resources to return or if there's a bug in the control plane and it's actually not conforming. Ideally we'd have something like

[ test client scripts ] <---> [ control plane being tested ] <--- [ test data files to populate store ]

where the control plane can consume a file specification to return mock data but that might be a moonshot. Perhaps something we could bake into the (go|python|java)-control-plane libraries to make it an easy API to adopt?

htuch commented 4 years ago

@derekargueta yeah, this is a good point. I think a well written xDS server should have some ability to mock here, i.e. there is some support for dependency injection. Something like the go-control-plane should be hiding the intricacies of things like nonce handling or delta discovery from the rest of the system, so this is the main validation target, not the entire control plane. Some kind of test harness for xDS server configuration makes sense.

snowp commented 4 years ago

I think I'm saying the same thing as Harvey here, but I think one thing to keep in mind is that the scope of these tests don't need to be expanded to necessarily handle all kinds of control plane interactions: the goal would to be validate the xDS implementation specifically, not the configuration pipeline etc.

In practice, this means that the conformance test run against the go-control-plane code, not the various control plane implementations that use it, as it doesn't leak the xDS implementation.

I can imagine implementing some basic mock API (file based/gRPC/whatever) that would simply update resources for a specific client (identified by its Node struct), with the ability to update things at the required granularity to verify that the xDS server behaves as expected. Given that the xDS protocol isolates individual clients, all we need to express is changes to the resources required by the one client that's connecting.

For go-control-plane & co this could likely be a small implementation that sits on top of its API, mapping mock calls to calls to updating the snapshot. For other implementations, I can imagine this encouraging a logical split within their code base between the xDS implementation and the rest (if one doesn't already), allowing them to integrate their xDS bits with the mock API to run against the conformance test.

howardjohn commented 4 years ago

This would be super useful for us. We have a test XDS server, and often times I cannot tell what is part of the XDS spec, what is an envoy implementation detail, and what is an implementation detail of our test server. This is especially common around handling ACKs.

howardjohn commented 4 years ago

I have been playing around with this a bit some thoughts:

I think implementing in Go will be the most useful because

Regarding test input, there are a fair amount of parameters we will need (thinking from a server conformance perspective):

I am building out some of the basic infra for this and trying to populate some super basic tests to see what this can look like at: https://github.com/howardjohn/xds-conformance. I am hoping we can get some idea of how this will work there then can merge it into envoyproxy org and add the actual tests

htuch commented 4 years ago

As discussed IRL, I agree we don't need deep mocking of resources, but there will need to be some sort of shim to allow named resources of different types to be created, deleted and updated (including version).

htuch commented 3 years ago

@howardjohn where did you end up with your explorations?

htuch commented 3 years ago

CNCF and @envoyproxy/api-shepherds have an RFP for vendors to bid on a project to build this in https://github.com/envoyproxy/envoy/issues/13165.

htuch commented 3 years ago

@howardjohn do you have any additional update? HH are spinning up on this project and would like to know if there's anything to be learned from the Istio experience. Thanks.

howardjohn commented 3 years ago

@htuch beyond https://github.com/envoyproxy/envoy/issues/11299#issuecomment-647711851 I haven't explored this much. I ran into issues with figuring out the "mock API" or inputs and how to make it generic. As a result, we just focused on expanding our own Istio-specific "conformance" tests which we have a fake ADS client, with Istio specific configurations. Most of these are in https://github.com/istio/istio/blob/8105c2bb98582ee78519dd7a19c7a5f1ab3faba6/pilot/pkg/xds/ads_test.go#L50-L49.