envoyproxy / java-control-plane

Java implementation of an Envoy gRPC control plane
Apache License 2.0
293 stars 136 forks source link

cache: Add separate cluster versioning #91

Closed jakubdyszkiewicz closed 5 years ago

jakubdyszkiewicz commented 5 years ago

Solves #89 . It adds option to version each cluster individually. I tried to make an API backward compatible.

Currently, only separate version for single resource name is supported.

Let's say we've got Cluster (version c1) of A B and individual versions: A: a1 B: b1

When we ask about version for cluster A, we return a1, for B is b1, but when we ask about A and B in one request, we return c1. I observed that with java-control-plane, Envoy only asks for one resource in EDS, but it is defined in xDS spec that it can be a list.

To return individual combined version for A and B we would have to combine it somehow. There is no easy way to do it by default. We could hash the list, but we could get hash collisions. That's why in this case we just return version of the cluster.

I could code the strategy here if you think it's better. In this way, the client of Snapshot class would choose what to do in case of the request with multiple resource names.

codecov-io commented 5 years ago

Codecov Report

Merging #91 into master will increase coverage by 0.13%. The diff coverage is 96.29%.

Impacted file tree graph

@@             Coverage Diff             @@
##             master     #91      +/-   ##
===========================================
+ Coverage     94.47%   94.6%   +0.13%     
- Complexity      134     141       +7     
===========================================
  Files            14      14              
  Lines           525     538      +13     
  Branches         46      46              
===========================================
+ Hits            496     509      +13     
  Misses           21      21              
  Partials          8       8
Impacted Files Coverage Δ Complexity Δ
.../io/envoyproxy/controlplane/cache/SimpleCache.java 96.19% <100%> (ø) 29 <2> (ø) :arrow_down:
...ava/io/envoyproxy/controlplane/cache/Snapshot.java 100% <100%> (ø) 25 <2> (+2) :arrow_up:
...voyproxy/controlplane/cache/SnapshotResources.java 92.3% <90.9%> (+6.59%) 8 <6> (+5) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 72fed97...37702f5. Read the comment docs.

jakubdyszkiewicz commented 5 years ago

Thanks for the review!

It's not immediately clear to me how this will be used by the server to filter out what resources to send. If you want to split this up into several PRs that's fine, but could you outline how you imagine this being used when triggering watches?

The idea is to provide the map of versions per cluster for the Snapshot. We create new snapshot on every change in discovery, but the versions for unchanged clusters are staying the same.

Outside of java-control-plane, we've got a class that generates Snapshot. We save the last snapshot in the map in this class and we compare new clusters to the old clusters. If nothing changed, we pass the same version to the Snapshot.

If you take a look at SimpleCache#setSnapshot we retrieve the version providing the resourcesName from the request

String version = snapshot.version(watch.request().getTypeUrl(), watch.request().getResourceNamesList());

Then we check for the version change, if it's different, we respond with the new data.

if (!watch.request().getVersionInfo().equals(version)) {
LOGGER.info("responding to open watch {}[{}] with new version {}",
              id,
              String.join(", ", watch.request().getResourceNamesList()),
              version);

          respond(watch, snapshot, group);

Note that if getResourceNamesList has 0 or more than 1 element, we just return the "aggregated" version just like before.

I hope this answers you question!

jakubdyszkiewicz commented 5 years ago

Hello,

I'm sorry for the lack of response. I did not forget about this PR. We are exploring other possibilities including ADS. I will get back to it when we collect the results.

sschepens commented 5 years ago

@jakubdyszkiewicz we're very interested in this, is there any way we can help/collaborate with this?

jakubdyszkiewicz commented 5 years ago

Hi @sschepens The problem with this solution for us is mainly the startup phase.

With gRPC you've got two thread pools, NIO for sending/receiving bytes and Worker for logic, serializing protos etc. Let's say you've got 100 envoys and 800 services. Now you've got 80k gRPC streams. On each stream the request is sent and it goes to the Worker pool, so you have to process 80k tasks on this pool. 80k is somewhat fine, but what if you aim for 1000 envoys? You cannot have unbounded queue, because it will result in OutOfMemoryException eventually.

The problem is you cannot reject requests on NIO thread in a graceful way because Netty is so hardcoded in gRPC that we haven't found a way to do it yet. You can only create interceptor on gRPC service which is run in the Worker thread. You can limit the worker queue and configure the pool to throw an exception if full, but then you end up with flood of exceptions in logs from NIO threads trying to pass the task to the worker pool.

With ADS you've got one gRPC stream per Envoy. Which means that you don't have a problem with growing worker pool. Additionally, you've got only one instance of StreamObserver and Watch for one Envoy instead of multiple for each services. Overall, much less objects on the heap. Of course, with ADS you have to send much more data over the network. For us this is 0,5KB per instance change without ADS + with separate versioning vs 250KB with ADS. Even with that tradeoff, the whole app seems to be more stable and much more predictable with ADS.

I think this is a good change nevertheless if someone don't want to use ADS. Maybe with external rate limitting/proper load balancing, it makes more sense. If you have time, feel free to help me with this. The one thing I'd change in this PR is rather from returning clusterVersion when there are multiple resource, I'd return separate "aggregated value" or even pass a strategy like this:

interface EndpointsVersionResolver {
  String version(List<String> resourceNames)
}

public static Snapshot create(
      Iterable<Cluster> clusters,
      String clustersVersion,
      Iterable<ClusterLoadAssignment> endpoints,
      EndpointsVersionResolver endpointVersionsResolver,
      Iterable<Listener> listeners,
      String listenersVersion,
      Iterable<RouteConfiguration> routes,
      String routesVersion,
      Iterable<Secret> secrets,
      String secretsVersion) {

     return new AutoValue_Snapshot(
        SnapshotResources.create(clusters, clustersVersion),
        SnapshotResources.create(endpoints, endpointVersionsResolver),
        SnapshotResources.create(listeners, listenersVersion),
        SnapshotResources.create(routes, routesVersion),
        SnapshotResources.create(secrets, secretsVersion));
  }

This way someone may want to provide his custom method of describing version for multiple resources, which can result in optimising sending redundant data to multiple groups even with ADS.

sschepens commented 5 years ago

This way someone may want to provide his custom method of describing version for multiple resources, which can result in optimising sending redundant data to multiple groups even with ADS.

I like what your're saying, though it kind of feels strange to have a different parameter than the rest only for endpoints, maybe we could apply this to all resources? This could theoretically also be used in the future, right?

So we could have something like this:

interface ResourceVersionResolver {
  String version(List<String> resourceNames)
}

public static Snapshot create(
      Iterable<Cluster> clusters,
      ResourceVersionResolver resourceVersionsResolver,
      Iterable<ClusterLoadAssignment> endpoints,
      ResourceVersionResolver resourceVersionsResolver,
      Iterable<Listener> listeners,
      ResourceVersionResolver resourceVersionsResolver,
      Iterable<RouteConfiguration> routes,
      ResourceVersionResolver resourceVersionsResolver,
      Iterable<Secret> secrets,
      ResourceVersionResolver resourceVersionsResolver) {

     return new AutoValue_Snapshot(
        SnapshotResources.create(clusters, resourceVersionsResolver),
        SnapshotResources.create(endpoints, resourceVersionsResolver),
        SnapshotResources.create(listeners, resourceVersionsResolver),
        SnapshotResources.create(routes, resourceVersionsResolver),
        SnapshotResources.create(secrets, resourceVersionsResolver));
  }

And it could be used this way:

create(
  clusters, () -> clusterVersion,
  endpoints, endpointVersionResolver,
  listeners, () -> listenerVersion,
  routes, () -> routesVersion,
  secrets, () -> secretsVersion
)
sschepens commented 5 years ago

@jakubdyszkiewicz @snowp don't know what you think of this.

I would like to come to a conclusion on which way to go and go ahead and implement it :)

jakubdyszkiewicz commented 5 years ago

@sschepens good idea, I like it :)

Just wondering whether ResourceVersionResolve#version should get List<String> resourceNames or DiscoveryRequest request.

Right now DiscoveryRequest contains versionInfo, resourceNames, typeUrl, responseNonce. We don't need typeUrl since we pass separate version resolvers, but previous versionInfo may be useful to establish a new version. It may be also more future proof if something interesting will be added to DiscoveryRequest.

What do you think about it?

sschepens commented 5 years ago

Sure, seems like a good idea.

sschepens commented 5 years ago

@jakubdyszkiewicz the only issue is that it difficult to continue to provide #version() methods, as we would have to construct a new discovery request instead of calling #version(Collections.emtpyList())

Also, Snapshot has a #version(String typeUrl) method, which would become trivial and should be replaced by #version(DiscoveryRequest request)

jakubdyszkiewicz commented 5 years ago

It's a bit awkward but I don't think this is a problem.

We can create new Snapshot#version(DiscoveryRequest) and keep old Snapshot#version(String typeUrl) which will construct empty DiscoveryRequest without anything but typeUrl and forward it to Snapshot#version(DiscoveryRequest) and old calls will be compatible.

snowp commented 5 years ago

FYI per resource versioning is not supported in general for standard XDS, this is what https://github.com/envoyproxy/envoy/issues/4991 aims to solve. This is because CDS/LDS doesn't even provide a list of resources, but instead just requests "all" by specifying an empty resource list.

I'm somewhat tempted to say that we should wait until incremental XDS lands, in particular due to recent developments around partial EDS responses during cluster warming (https://docs.google.com/document/d/1Ca4YX9XNnV9rjTR7U0_xyCxUDKfoSun8pWSUNO9N-tY/edit). I worry that attempting to do partial EDS updates with the current XDS protocol will result in subtle and hard to debug bugs.

WDYT @jakubdyszkiewicz @sschepens ?

jakubdyszkiewicz commented 5 years ago

Yeah, I read about Incremental xDS and it definitely solves scalability issues. Actually, we operate with Consul the same way now. If a service A want to communicate to service B, it asks Consul about B and tracks changes of B. It doesn't need to know about all instances of every service.

The problem is that it seems that it's gonna be a while to actually see this in Envoy. I think we can live without it because ADS seems to work better for us for now.

sschepens commented 5 years ago

I worry that attempting to do partial EDS updates with the current XDS protocol will result in subtle and hard to debug bugs.

@snowp why do you think this will happen? It seems that the logic is pretty straight forward.

Just as @jakubdyszkiewicz i worry that it gonna be a long while till we see this in envoy and stable enough.

snowp commented 5 years ago

Because you're relying on undocumented implementation details in Envoy around how EDS calls are batched: it doesn't seem to be in line with the XDS protocol itself to care about the size of the resource list.

I think a more stable implementation of per-resource versioning would not care about the size of the resources requested but instead only respond with the requested resources whose version has changed, regardless of request size. By having this only apply to requests with non-empty resource hints it will naturally only apply to RDS and EDS.

sschepens commented 5 years ago

Because you're relying on undocumented implementation details in Envoy around how EDS calls are batched: it doesn't seem to be in line with the XDS protocol itself to care about the size of the resource list.

Is it really undocumented that envoy creates a new EDS stream for each cluster if not operating in ADS mode?

I think a more stable implementation of per-resource versioning would not care about the size of the resources requested but instead only respond with the requested resources whose version has changed, regardless of request size. By having this only apply to requests with non-empty resource hints it will naturally only apply to RDS and EDS.

@snowp yeah i agree with you, but if we made the changes that @jakubdyszkiewicz suggested, then the logic regarding the election of the version would be on the user side, right? They would need to provide a ResourceVersionResolver and handle the versioning themselves. Also, does RDS also support per resource versioning?

snowp commented 5 years ago

Is it really undocumented that envoy creates a new EDS stream for each cluster if not operating in ADS mode?

Envoy does, but it's not part of the XDS protocol. Now that other parties are using the XDS protocol to implement load balancing (eg gRPC is adding support for load balancing that consumes XDS APIs) I'd want to make sure that the control plane implementation isn't tied to specific XDS implementations.

@snowp yeah i agree with you, but if we made the changes that @jakubdyszkiewicz suggested, then the logic regarding the election of the version would be on the user side, right? They would need to provide a ResourceVersionResolver and handle the versioning themselves. Also, does RDS also support per resource versioning?

Yeah sorry, I was still focusing on the approach of handling differently sized XDS requests differently. I'm okay with extension points that allow users of the control plane to specify their own usage - then you can put in whatever assumptions about the XDS client you want.

Yes, RDS supports the same kind of versioning that EDS does.

sschepens commented 5 years ago

Yeah sorry, I was still focusing on the approach of handling differently sized XDS requests differently. I'm okay with extension points that allow users of the control plane to specify their own usage - then you can put in whatever assumptions about the XDS client you want.

@snowp Great, I'm up for doing this, the only issue is that to make it fully extensible we would want to pass the whole DiscoveryRequest down to the ResourceVersionResolver but that would imply change the signature of current #version methods. If we only pass List<String> resourceNames that would be easier as we can leave current methods as aliases for sending an empty list.

What do you think, which way should we go?

snowp commented 5 years ago

I think if you provide List<String> resources, String typeUrl that would probably be sufficient, i dont see what else you would care about in the discovery request

sschepens commented 5 years ago

@snowp well, the discovery request has the previous acknowledged version, it could prove useful I think.

jakubdyszkiewicz commented 5 years ago

Closing. https://github.com/envoyproxy/java-control-plane/pull/94 is merged.