connectrpc / connect-es

The TypeScript implementation of Connect: Protobuf RPC that works.
https://connectrpc.com/
Apache License 2.0
1.34k stars 76 forks source link

RST_STREAM Handling #1108

Closed mpataki closed 3 weeks ago

mpataki commented 3 months ago

Describe the bug

When a grpc server is abruptly closed while a streaming RPC is open, client may see a RST_STREAM http/2 frame. No grpc level status, trailer, etc. are present in the response.

I'm using a grpc transport and a PromiseClient, which represents a stream as an async iterable. This scenario results in the client terminating the iterable without error, as though the stream was closed normally/successfully.

The GRPC HTTP/2 protocol states the following:

RPC runtime implementations should interpret RST_STREAM as immediate full-closure of the stream and should propagate an error up to the calling application layer.

In my particular case, which is using a fairly vanilla Istio/Envoy proxy, the HTTP/2 code returned is NO_ERROR(0). I see no grpc status header.

I believe the expected client behaviour would be to throw an error from the iterable/stream so the application can handle it (possibly reconnecting, etc).

To Reproduce

I'm testing server restarts in a k8s deployment with Istio/Envoy in front of the services. Abruptly terminating pods, Envoy is responding with the RST_STREAM frame.

Environment:

mpataki commented 3 months ago

For some more context, this is how grpcurl handles this case:

Response trailers received:
(empty)
Sent 1 request and received 1 response
ERROR:
  Code: Internal
  Message: stream terminated by RST_STREAM with error code: NO_ERROR

And this is how postman handles it:

Pasted image 20240615075230
srikrsna-buf commented 3 months ago

Hey! Thank you reporting this, we are looking in to this. Node seems to always report a NO_ERROR code even when an RST_STREAM is not sent from the server.

mpataki commented 3 months ago

Ok! Thanks @srikrsna-buf. I'll stay posted. Please let me know if I can help at all

mpataki commented 3 months ago

Hey @srikrsna-buf, curious how your initial look at this panned out? Does it seem readily solvable or should we expect to work around this issue for the foreseeable future?

Just considering how we should proceed on our end πŸ˜ƒ. I've got a pretty rock-n-roll patch in place as a stop gap but I'll sort something else if you think a connect-es fix is not in our future.

srikrsna-buf commented 3 months ago

One solution I was exploring is waiting for the trailers event to fire and if that fires, then a no error code is a no-op. But I need to test this. Once you receive data, you are not supposed to get a no error code.

I'm curious what is the patch you have?

mpataki commented 3 months ago

I'm treating any stream closure as an error, and attempting a reconnect. This allows us to recover from this case but introduces need for bespoke handling of cases where the stream was legitimately closed without error by our backend.

srikrsna-buf commented 2 months ago

I'm treating any stream closure as an error, and attempting a reconnect. This allows us to recover from this case but introduces need for bespoke handling of cases where the stream was legitimately closed without error by our backend.

I have a similar fix implemented in #1113. It checks to see if we received any response and throws an error. If you are using Envoy one problem this fix could result in is throwing an error when it is not. grpc-js suffers from a similar problem: https://github.com/grpc/grpc-node/issues/2569

mpataki commented 2 months ago

Nice, this makes sense to me fwiw @srikrsna-buf πŸ™ thank-you!

mpataki commented 1 month ago

@srikrsna-buf checking back in on this. Anything blocking us?

srikrsna-buf commented 1 month ago

I'll pick this up next week, since we don't have a way to tell if we received an code, we will need to do it at the protocol level.