envoyproxy / xds-relay

Caching, aggregation, and relaying for xDS compliant clients and origin servers
Apache License 2.0
132 stars 29 forks source link

Retry upstream stream failures #159

Closed jyotimahapatra closed 4 years ago

jyotimahapatra commented 4 years ago

This PR proposes the following change in behavior:

Grpc concepts on which the impl relies:

  1. grpc takes care of connection management. Once the connection is established, it needs no work from the client to reconnect on backend closures, creating new cx on goaway frames, and any other cx related scenario. The connection is picked back up when the backend starts responding.
  2. Streams are ephemeral and even when cx is closed using ctx cancel, stream creation fails with a grpc status code 14, similar to when backend is unavailable. We make sure to cancel stream creation when ctx is cancelled.
  3. The default grpc cx connect params are https://github.com/grpc/grpc-go/blob/v1.32.0/backoff/backoff.go#L47-L51. These default params work for now and we can revisit these later when we need any tuning.
  4. Retrying new streams on failure is not expensive since stream creation is based on cx characteristics, however we do intend to add backoff functionality. This is intended for a separate PR.
  5. Once a stream has an error, it becomes aborted and cannot be reused. In our case both recv and send have to use the same stream and need coordination to bail out.