Open jparklab opened 3 years ago
Please let me know if there's anything I can help with. (e.g. testing a patch, reviewing a MR, or creating a MR with changes I have).
@jparklab thanks for the solution! We're also currently investigating this since you posted: https://github.com/envoyproxy/go-control-plane/issues/446 previously. I'll look into what you posted above and see were we can proceed. I have a PR up now I might be able to move this into.
Seems it's not completely solved, I run into a similar issue in the latest version using envoy-gateway. Sometimes the it's just get stuck, and only responses after I restart the process.
Envoy sends multiple DeltaDiscoveryRequests for the same typeURL via the same stream connection to keep receiving updates for the type. However, the current incrementalXDS implementation silently ignores from the second request.
Here's what we saw from output of our integration test.(we added callbacks to print messages on delta stream open/close/request/response)
on 2021-06-11T18:40:37.591Z, the snapshot is updated to a new version that adds a new routes to RouteConfiguration along with a new Cluster/ClusterAssignment, but there is no response sent to the envoy until the test times out after 10 minutes.
It seems like that the logic at https://github.com/envoyproxy/go-control-plane/blob/main/pkg/server/delta/v3/server.go#L160 does not handle subsequent requests for the same typeURL well.
watch.responses
(https://github.com/envoyproxy/go-control-plane/blob/main/pkg/server/delta/v3/server.go#L182)I managed it to pass our integration tests by patching the code that handles the request(https://github.com/envoyproxy/go-control-plane/blob/main/pkg/server/delta/v3/server.go#L160-L199) like below.
(FYI, the tests still fail sometimes because the stream is closed unexpected by the controller, and envoy does not seem to be able to recover it. I'll post an update if I find why the stream is closed unexpected)