Closed jakubdyszkiewicz closed 5 years ago
Merging #91 into master will increase coverage by
0.13%
. The diff coverage is96.29%
.
@@ Coverage Diff @@
## master #91 +/- ##
===========================================
+ Coverage 94.47% 94.6% +0.13%
- Complexity 134 141 +7
===========================================
Files 14 14
Lines 525 538 +13
Branches 46 46
===========================================
+ Hits 496 509 +13
Misses 21 21
Partials 8 8
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
.../io/envoyproxy/controlplane/cache/SimpleCache.java | 96.19% <100%> (ø) |
29 <2> (ø) |
:arrow_down: |
...ava/io/envoyproxy/controlplane/cache/Snapshot.java | 100% <100%> (ø) |
25 <2> (+2) |
:arrow_up: |
...voyproxy/controlplane/cache/SnapshotResources.java | 92.3% <90.9%> (+6.59%) |
8 <6> (+5) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 72fed97...37702f5. Read the comment docs.
Thanks for the review!
It's not immediately clear to me how this will be used by the server to filter out what resources to send. If you want to split this up into several PRs that's fine, but could you outline how you imagine this being used when triggering watches?
The idea is to provide the map of versions per cluster for the Snapshot
. We create new snapshot on every change in discovery, but the versions for unchanged clusters are staying the same.
Outside of java-control-plane, we've got a class that generates Snapshot. We save the last snapshot in the map in this class and we compare new clusters to the old clusters. If nothing changed, we pass the same version to the Snapshot.
If you take a look at SimpleCache#setSnapshot
we retrieve the version providing the resourcesName from the request
String version = snapshot.version(watch.request().getTypeUrl(), watch.request().getResourceNamesList());
Then we check for the version change, if it's different, we respond with the new data.
if (!watch.request().getVersionInfo().equals(version)) {
LOGGER.info("responding to open watch {}[{}] with new version {}",
id,
String.join(", ", watch.request().getResourceNamesList()),
version);
respond(watch, snapshot, group);
Note that if getResourceNamesList
has 0 or more than 1 element, we just return the "aggregated" version just like before.
I hope this answers you question!
Hello,
I'm sorry for the lack of response. I did not forget about this PR. We are exploring other possibilities including ADS. I will get back to it when we collect the results.
@jakubdyszkiewicz we're very interested in this, is there any way we can help/collaborate with this?
Hi @sschepens The problem with this solution for us is mainly the startup phase.
With gRPC you've got two thread pools, NIO for sending/receiving bytes and Worker for logic, serializing protos etc. Let's say you've got 100 envoys and 800 services. Now you've got 80k gRPC streams. On each stream the request is sent and it goes to the Worker pool, so you have to process 80k tasks on this pool. 80k is somewhat fine, but what if you aim for 1000 envoys? You cannot have unbounded queue, because it will result in OutOfMemoryException eventually.
The problem is you cannot reject requests on NIO thread in a graceful way because Netty is so hardcoded in gRPC that we haven't found a way to do it yet. You can only create interceptor on gRPC service which is run in the Worker thread. You can limit the worker queue and configure the pool to throw an exception if full, but then you end up with flood of exceptions in logs from NIO threads trying to pass the task to the worker pool.
With ADS you've got one gRPC stream per Envoy. Which means that you don't have a problem with growing worker pool. Additionally, you've got only one instance of StreamObserver
and Watch
for one Envoy instead of multiple for each services. Overall, much less objects on the heap.
Of course, with ADS you have to send much more data over the network. For us this is 0,5KB per instance change without ADS + with separate versioning vs 250KB with ADS. Even with that tradeoff, the whole app seems to be more stable and much more predictable with ADS.
I think this is a good change nevertheless if someone don't want to use ADS. Maybe with external rate limitting/proper load balancing, it makes more sense. If you have time, feel free to help me with this. The one thing I'd change in this PR is rather from returning clusterVersion
when there are multiple resource, I'd return separate "aggregated value" or even pass a strategy like this:
interface EndpointsVersionResolver {
String version(List<String> resourceNames)
}
public static Snapshot create(
Iterable<Cluster> clusters,
String clustersVersion,
Iterable<ClusterLoadAssignment> endpoints,
EndpointsVersionResolver endpointVersionsResolver,
Iterable<Listener> listeners,
String listenersVersion,
Iterable<RouteConfiguration> routes,
String routesVersion,
Iterable<Secret> secrets,
String secretsVersion) {
return new AutoValue_Snapshot(
SnapshotResources.create(clusters, clustersVersion),
SnapshotResources.create(endpoints, endpointVersionsResolver),
SnapshotResources.create(listeners, listenersVersion),
SnapshotResources.create(routes, routesVersion),
SnapshotResources.create(secrets, secretsVersion));
}
This way someone may want to provide his custom method of describing version for multiple resources, which can result in optimising sending redundant data to multiple groups even with ADS.
This way someone may want to provide his custom method of describing version for multiple resources, which can result in optimising sending redundant data to multiple groups even with ADS.
I like what your're saying, though it kind of feels strange to have a different parameter than the rest only for endpoints, maybe we could apply this to all resources? This could theoretically also be used in the future, right?
So we could have something like this:
interface ResourceVersionResolver {
String version(List<String> resourceNames)
}
public static Snapshot create(
Iterable<Cluster> clusters,
ResourceVersionResolver resourceVersionsResolver,
Iterable<ClusterLoadAssignment> endpoints,
ResourceVersionResolver resourceVersionsResolver,
Iterable<Listener> listeners,
ResourceVersionResolver resourceVersionsResolver,
Iterable<RouteConfiguration> routes,
ResourceVersionResolver resourceVersionsResolver,
Iterable<Secret> secrets,
ResourceVersionResolver resourceVersionsResolver) {
return new AutoValue_Snapshot(
SnapshotResources.create(clusters, resourceVersionsResolver),
SnapshotResources.create(endpoints, resourceVersionsResolver),
SnapshotResources.create(listeners, resourceVersionsResolver),
SnapshotResources.create(routes, resourceVersionsResolver),
SnapshotResources.create(secrets, resourceVersionsResolver));
}
And it could be used this way:
create(
clusters, () -> clusterVersion,
endpoints, endpointVersionResolver,
listeners, () -> listenerVersion,
routes, () -> routesVersion,
secrets, () -> secretsVersion
)
@jakubdyszkiewicz @snowp don't know what you think of this.
I would like to come to a conclusion on which way to go and go ahead and implement it :)
@sschepens good idea, I like it :)
Just wondering whether ResourceVersionResolve#version
should get List<String> resourceNames
or DiscoveryRequest request
.
Right now DiscoveryRequest
contains versionInfo
, resourceNames
, typeUrl
, responseNonce
. We don't need typeUrl
since we pass separate version resolvers, but previous versionInfo
may be useful to establish a new version.
It may be also more future proof if something interesting will be added to DiscoveryRequest
.
What do you think about it?
Sure, seems like a good idea.
@jakubdyszkiewicz the only issue is that it difficult to continue to provide #version()
methods, as we would have to construct a new discovery request instead of calling #version(Collections.emtpyList())
Also, Snapshot
has a #version(String typeUrl)
method, which would become trivial and should be replaced by #version(DiscoveryRequest request)
It's a bit awkward but I don't think this is a problem.
We can create new Snapshot#version(DiscoveryRequest)
and keep old Snapshot#version(String typeUrl)
which will construct empty DiscoveryRequest
without anything but typeUrl
and forward it to Snapshot#version(DiscoveryRequest)
and old calls will be compatible.
FYI per resource versioning is not supported in general for standard XDS, this is what https://github.com/envoyproxy/envoy/issues/4991 aims to solve. This is because CDS/LDS doesn't even provide a list of resources, but instead just requests "all" by specifying an empty resource list.
I'm somewhat tempted to say that we should wait until incremental XDS lands, in particular due to recent developments around partial EDS responses during cluster warming (https://docs.google.com/document/d/1Ca4YX9XNnV9rjTR7U0_xyCxUDKfoSun8pWSUNO9N-tY/edit). I worry that attempting to do partial EDS updates with the current XDS protocol will result in subtle and hard to debug bugs.
WDYT @jakubdyszkiewicz @sschepens ?
Yeah, I read about Incremental xDS and it definitely solves scalability issues. Actually, we operate with Consul the same way now. If a service A want to communicate to service B, it asks Consul about B and tracks changes of B. It doesn't need to know about all instances of every service.
The problem is that it seems that it's gonna be a while to actually see this in Envoy. I think we can live without it because ADS seems to work better for us for now.
I worry that attempting to do partial EDS updates with the current XDS protocol will result in subtle and hard to debug bugs.
@snowp why do you think this will happen? It seems that the logic is pretty straight forward.
Just as @jakubdyszkiewicz i worry that it gonna be a long while till we see this in envoy and stable enough.
Because you're relying on undocumented implementation details in Envoy around how EDS calls are batched: it doesn't seem to be in line with the XDS protocol itself to care about the size of the resource list.
I think a more stable implementation of per-resource versioning would not care about the size of the resources requested but instead only respond with the requested resources whose version has changed, regardless of request size. By having this only apply to requests with non-empty resource hints it will naturally only apply to RDS and EDS.
Because you're relying on undocumented implementation details in Envoy around how EDS calls are batched: it doesn't seem to be in line with the XDS protocol itself to care about the size of the resource list.
Is it really undocumented that envoy creates a new EDS stream for each cluster if not operating in ADS mode?
I think a more stable implementation of per-resource versioning would not care about the size of the resources requested but instead only respond with the requested resources whose version has changed, regardless of request size. By having this only apply to requests with non-empty resource hints it will naturally only apply to RDS and EDS.
@snowp yeah i agree with you, but if we made the changes that @jakubdyszkiewicz suggested, then the logic regarding the election of the version would be on the user side, right? They would need to provide a ResourceVersionResolver
and handle the versioning themselves. Also, does RDS also support per resource versioning?
Is it really undocumented that envoy creates a new EDS stream for each cluster if not operating in ADS mode?
Envoy does, but it's not part of the XDS protocol. Now that other parties are using the XDS protocol to implement load balancing (eg gRPC is adding support for load balancing that consumes XDS APIs) I'd want to make sure that the control plane implementation isn't tied to specific XDS implementations.
@snowp yeah i agree with you, but if we made the changes that @jakubdyszkiewicz suggested, then the logic regarding the election of the version would be on the user side, right? They would need to provide a ResourceVersionResolver and handle the versioning themselves. Also, does RDS also support per resource versioning?
Yeah sorry, I was still focusing on the approach of handling differently sized XDS requests differently. I'm okay with extension points that allow users of the control plane to specify their own usage - then you can put in whatever assumptions about the XDS client you want.
Yes, RDS supports the same kind of versioning that EDS does.
Yeah sorry, I was still focusing on the approach of handling differently sized XDS requests differently. I'm okay with extension points that allow users of the control plane to specify their own usage - then you can put in whatever assumptions about the XDS client you want.
@snowp Great, I'm up for doing this, the only issue is that to make it fully extensible we would want to pass the whole DiscoveryRequest
down to the ResourceVersionResolver
but that would imply change the signature of current #version
methods. If we only pass List<String> resourceNames
that would be easier as we can leave current methods as aliases for sending an empty list.
What do you think, which way should we go?
I think if you provide List<String> resources, String typeUrl
that would probably be sufficient, i dont see what else you would care about in the discovery request
@snowp well, the discovery request has the previous acknowledged version, it could prove useful I think.
Closing. https://github.com/envoyproxy/java-control-plane/pull/94 is merged.
Solves #89 . It adds option to version each cluster individually. I tried to make an API backward compatible.
Currently, only separate version for single resource name is supported.
Let's say we've got Cluster (version
c1
) ofA
B
and individual versions:A
:a1
B
:b1
When we ask about version for cluster
A
, we returna1
, forB
isb1
, but when we ask aboutA
andB
in one request, we returnc1
. I observed that with java-control-plane, Envoy only asks for one resource in EDS, but it is defined in xDS spec that it can be a list.To return individual combined version for
A
andB
we would have to combine it somehow. There is no easy way to do it by default. We could hash the list, but we could get hash collisions. That's why in this case we just return version of the cluster.I could code the strategy here if you think it's better. In this way, the client of
Snapshot
class would choose what to do in case of the request with multiple resource names.