dotnet / extensions

This repository contains a suite of libraries that provide facilities commonly needed when creating production-ready applications.
MIT License
2.58k stars 740 forks source link

HTTP resiliency features don't work with the .NET gRPC client #4923

Open DamianEdwards opened 7 months ago

DamianEdwards commented 7 months ago

The HTTP resiliency features, including those added by the IHttpClientBuilder.AddStandardResilienceHandler method, don't apply to gRPC calls despite them going through configured HttpClient instances. This is due to the gRPC stack not exposing error details at the HTTP request level in the way that the resiliency features expect (e.g. using HTTP status codes).

The following code example, typical of setting up a gRPC client in a .NET server application, will not actually result in the standard resiliency features being applied to gRPC calls:

builder.Services.AddGrpcClient<Basket.BasketClient>(o => o.Address = new("http://basket-api"))
    .AddStandardResilienceHandler();

Consider adding support for the standard resiliency patterns to the .NET gRPC client stack in a similar fashion to those added to the HttpClient stack so that resiliency features like Circuit Breaker can be easily added by default.

/Cc @JamesNK @davidfowl

joperezr commented 7 months ago

cc: @martintmk @martincostello @geeknoid

martintmk commented 7 months ago

The easy enhancement is to improve the HttpClientResiliencePredicates to also detect gRPC calls and handle retriable status codes:

https://github.com/dotnet/extensions/blob/80abb8ddf7a2454930ae2378b121f044fe3df848/src/Libraries/Microsoft.Extensions.Http.Resilience/Polly/HttpClientResiliencePredicates.cs#L46

This should make both retry and circuit breaker strategy work for gRPC. The other issue is handling of streamed calls, which I am not sure how to address.

JamesNK commented 7 months ago

gRPC always return 200 status code. Failure is communicated in grpc-status trailer.

I haven't looked at how resilience works, but I'm guessing the retry happens inside a HTTP handler's SendAsync. gRPC supports streaming an error can occur long after response status is returned and SendAsync has run.

I think a known limitation will be that streaming gRPC calls won't be retried. However, failing unary calls should be detectable. Look for a 200 status code and also check the response headers for grpc-status. They will both be available in SendAsync.

martintmk commented 7 months ago

Failure is communicated in grpc-status trailer.

The trailer is available only after the response body is finished reading, is that correct? I am wondering how we can ensure that trailer is available for gRPC calls. Otherwise, the retries won't work.

Will buffering the content work?

JamesNK commented 7 months ago

Will buffering the content work?

No.

If an error happens before any content is returned by the server, then grpc-status is in the headers. That is the scenario that will work. It's confusingly named Trailers-Only in the spec - https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#responses

soroshsabz commented 5 months ago

ITNOA

Any plan to implement specific extensions for support gRPC in Microsoft.Extensions.Resilience?

thanks

joperezr commented 5 months ago

Any plan to implement specific extensions for support gRPC in Microsoft.Extensions.Resilience?

That is what this issue is tracking. No committed timelines yet, so for now we just want to continue this discussion.