Fix default HTTP resilience not to retry non-idempotent requests (POST, PUT, etc.)

SteveSandersonMS commented 5 months ago

Original issue description added by @SteveSandersonMS

Currently if a backend service takes a while to handle a request (e.g., > 5 sec), the default HTTP resilience policy will retry.

This is fine for idempotent requests (e.g., GET), but dangerous/harmful for other request types (POST, PUT). While building eShopSupport I often encountered cases where it would create duplicate messages/tickets just because either:

[1] I was debugging, and hence causing things to be slow [2] Or, some lower-level service was still starting up and loading an AI model or similar, causing it to be temporarily delayed

It's fine for developers to configure a retry policy for POST/PUT/etc themselves, but I don't think the default should be to retry those and was very surprised when finding that it is.

API Proposal added by @iliar-turdushev

Background and motivation

Microsoft.Extensions.Http.Resilience package provides a delegating handler called StandardResilienceHandler that applies various resilience strategies (retry, timeout, rate limiting) to HttpClient. For example, it automatically makes retries of an HTTP request if it fails with a transient error. You can register StandardResilienceHandler for HttpClient as following:

services.AddHttpClient("test").AddStandardResilienceHandler();

or you can use an overload that accepts options configuring resilience strategies:

services.AddHttpClient("test").AddStandardResilienceHandler(options =>
{
    // You can configure resilience options here.
});

Each resilience strategy of StandardResilienceHandler has a predicate deciding whether to apply the strategy or not based on the outcome of an HTTP request, i.e. whether the outcome is a transient error or not. Retry strategy by default is configured to make retries for all HTTP methods, i.e. it will make retries for both idempotent and non-idempotent HTTP methods. That brings the issue described above at the beginning of this comment. While the current default behavior might be undesired for some services, we don't want to change it because:

In general, we cannot make assumptions about idempotency of APIs based on HTTP methods being used to access those APIs.
Such a change will be breaking and would lead to unpleasant results for existing services that already use StandardResilienceHandler.

As a compromise solution to the issue we would like to introduce a new API allowing to easily disable the default behavior and don't do retries for non-idempotent HTTP methods.

API Proposal

namespace Microsoft.Extensions.Http.Resilience;

/// <summary>
/// Extensions for HttpRetryStrategyOptionsExtensions.
/// </summary>
public static class HttpRetryStrategyOptionsExtensions
{
    /// <summary>
    /// Disables retries for POST, PATCH, PUT, DELETE, and CONNECT.
    /// </summary>
    /// <param name="option">Retry strategy options.</param>
    public static void DisableForUnsafeHttpMethods(this HttpRetryStrategyOptions option);

    /// <summary>
    /// Disables retries for the given list of HTTP methods.
    /// </summary>
    /// <param name="options">Retry strategy options.</param>
    /// <param name="methods">List of HTTP methods.</param>
    public static void DisableFor(this HttpRetryStrategyOptions options, params HttpMethod[] methods);
}

According to RFC safe HTTP methods are those whose semantics is read-only. POST, PATCH, PUT, DELETE, and CONNECT are unsafe. While for PUT and DELETE doing retries might be safe, we propose an API method DisableForUnsafeHttpMethods disabling retries for all unsafe HTTP methods, i.e. HTTP methods that could be harmful for the server. Additionally, to have some flexibility we want to introduce a method DisableFor allowing users to specify which HTTP methods their service considers as non-retriable.

API Usage

DisableForUnsafeHttpMethods:

services.AddHttpClient("test").AddStandardResilienceHandler(options =>
{
    options.Retry.DisableForUnsafeHttpMethods();
});

DisableFor:

services.AddHttpClient("test").AddStandardResilienceHandler(options =>
{
    options.Retry.DisableFor(HttpMethod.Post, HttpMethod.Patch);
});

Alternative Designs

Extend existing AddStandardResilienceHandler methods with an argument accepting a list of HTTP methods for which users don't want to do retries.

API:

namespace Microsoft.Extensions.DependencyInjection;

public static partial class ResilienceHttpClientBuilderExtensions
{
    // Registers StandardResilienceHanlder in the given IHttpClientBuilder and configures its retry strategy to not do retries for the given list of HTTP methods.
    public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(this IHttpClientBuilder builder, params HttpMethod[] nonIdempotentHttpMethods);

    // Like the first method + accepts a delegate to configure resilience options.
    public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(this IHttpClientBuilder builder, Action<HttpStandardResilienceOptions> configure, params HttpMethod[] nonIdempotentHttpMethods);

    // Like the first method + accepts a configuration section to configure resilience options.
    public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(this IHttpClientBuilder builder, IConfigurationSection section, params HttpMethod[] nonIdempotentHttpMethods);
}

Usage:

services.AddHttpClient("test").AddStandardResilienceHandler(HttpMethod.Post, HttpMethod.Patch);

The downside of this approach is that:

We have to duplicate each overload of AddStandardResilienceHandler method with an argument accepting a list of non-idempotent HTTP methods.
A list of non-idempotent HTTP methods affects only Retry strategy, but we pass it to a method registering a resilience handler also combining other resilience strategies, e.g. timeout, rate limiting, circuit breaker.

Risks

Not aware of any risks.

davidfowl commented 5 months ago

@joperezr This should be built into the resiliency package.

iliar-turdushev commented 2 months ago

Changing the default behavior to not retry non-idempotent calls would be a breaking change, since some customers already rely on current default behavior.

What if we introduce a new method checking whether the operation is retriable or not (we already have a class for such methods):

// Returns true if the outcome of an operation is Transient and Retriable, otherwise - false.
public static bool IsRetriable(Outcome<HttpResponseMessage> outcome, CancellationToken token);

and then users would need to use it to change default behavior of Retry strategy:

services
    .AddHttpClient()
    .AddStandardResilienceHandler(options =>
    {
        options.Retry.ShouldHandle = args =>
            new ValueTask<bool>(HttpClientResiliencePredicates.IsRetriable(args.Outcome, args.Context.CancellationToken));
    });

What do you think?

martincostello commented 2 months ago

Maybe there's scope for a helper built on top of that like options.Retry.DisableRetriesForNonIdempotentRequests() or options.Retry.EnableOnlyForHttpMethods([...])? It's a terrible name and might not be the right shape, but that's the idea I'm getting at. It feels a bit verbose to get the user to use the snippet above as-is for the default case. They could always do something like that if they want to customise further or be more granular.

davidfowl commented 2 months ago

Changing the default behavior to not retry non-idempotent calls would be a breaking change, since some customers already rely on current default behavior.

I understand but this feels like a bug.

RussKie commented 2 months ago

Only POST and CONNECT methods are not idempotent. Please refer to RFC 7231.

Summarizing, the HTTP methods are classified as following:

+---------+------+------------+
| Method  | Safe | Idempotent |
+---------+------+------------+
| CONNECT | no   | no         |
| DELETE  | no   | yes        |
| GET     | yes  | yes        |
| HEAD    | yes  | yes        |
| OPTIONS | yes  | yes        |
| POST    | no   | no         |
| PUT     | no   | yes        |
| TRACE   | yes  | yes        |
+---------+------+------------+

Please refer to https://stackoverflow.com/a/45019073/2338036 for more information.

RussKie commented 2 months ago

I like the idea of disabling retry for non-idempotent HTTP methods by default and giving the flexibility of enabling/disabling retry for any HTTP method.

Since this could be a breaking change, we should bake it in to the .NET 9 release (the "dev" branch) before we snap (i.e., make the dev the main). This will be happening in a matter of days.

iliar-turdushev commented 2 months ago

I like the idea of disabling retry for non-idempotent HTTP methods by default and giving the flexibility of enabling/disabling retry for any HTTP method.

Since this could be a breaking change, we should bake it in to the .NET 9 release (the "dev" branch) before we snap (i.e., make the dev the main). This will be happening in a matter of days.

Do I understand it correctly that we are allowed to introduce breaking changes in major versions? If so, then we can disable retries for non-idempotent HTTP methods by default in .NET 9. In addition to POST and CONNECT we need to disable retries for PATCH which may be non-idempotent, see RFC 5789, Chapter 2, The PATCH method. I think at this stage we don't need to introduce new API to enable/disable retries for various HTTP methods. If a user needs that he/she we'll be able to achieve that using current API as shown above https://github.com/dotnet/extensions/issues/5248#issuecomment-2376621430. If users ask to introduce more convenient API for enabling/disabling retries for HTTP methods we'll consider their ask reactively.

Hedging has the same issue. Hence we'll need similarly disable non-idempotent HTTP methods for hedging by default.

To summarize above:

In .NET 9 we'll disable retries and hedging for POST, CONNECT, and PATCH HTTP methods.
We'll not introduce new API for enabling/disabling retries or hedging for any HTTP methods. Currently users are able to achieve that using existing API.

Question: What is the best way to communicate such a breaking change, so that users will not miss it?

What do you think?

cc @geeknoid @martintmk

RussKie commented 1 month ago

Question: What is the best way to communicate such a breaking change, so that users will not miss it?

Our release notes and the docs. We can/should probably document the breaking changes at https://learn.microsoft.com/dotnet/core/compatibility/9.0; we'll need to work with the Docs team.

martintmk commented 1 month ago

In .NET 9 we'll disable retries and hedging for POST, CONNECT, and PATCH HTTP methods. We'll not introduce new API for enabling/disabling retries or hedging for any HTTP methods. Currently users are able to achieve that using existing API.

While I am not fan of breaking changes such as this, this change seems waranted.

// Returns true if the outcome of an operation is Transient and Retriable, otherwise - false. public static bool IsRetriable(Outcome outcome, CancellationToken token);

The semantic meaning is the same as the existing IsTransient method. I wouldn't introduce a new method. What about a new overload with parameter such as evaluateHttpMethods. By default, it will be enabled, but existing clients can revert back to old behavior by doing:

services
    .AddHttpClient()
    .AddStandardResilienceHandler(options =>
    {
        options.Retry.ShouldHandle = args =>
            new ValueTask<bool>(HttpClientResiliencePredicates.IsTransient(
                  args.Outcome,
                  evaluateHttpMethods: false,
                  cancellationToken: args.Context.CancellationToken));
    });

Our release notes and the docs. We can/should probably document the breaking changes at https://learn.microsoft.com/dotnet/core/compatibility/9.0; we'll need to work with the Docs team.

For internal Microsoft services, this needs to be a very clearly documented because this has a potential to cause issues. They need to be aware before upgrading.

geeknoid commented 1 month ago

I don't think we should change the default behavior that exists today, for two reasons:

The actual idempotence/non-idempotence of API calls is very often not coupled to the actual HTTP verbs. Making broad assumptions based on the verb is simply wrong in the real world.
Given the above, it would be a potential very unpleasant surprise for services designed around the current behavior to suddenly receive different behavior due to some dependency upgrades.

I think we should keep exactly the behavior we have today, document it, and make it super-easy to disable it when needed.

iliar-turdushev commented 1 month ago

I agree with @geeknoid. Let's not change the behavior we have and try and make disabling retries for non-idempotent HTTP methods easier. What if we introduce an extension method on HttpRetryStrategyOptions allowing to disable retries for a given list of HTTP methods (as @martincostello suggested):

// Disables retries for a given list of HTTP methods.
public static void DisableForHttpMethods(this HttpRetryStrategyOptions options, params HttpMethod[] httpMethods);

And the usage will be:

services
    .AddHttpClient()
    .AddStandardResilienceHandler(options => options.Retry.DisableForHttpMethods(HttpMethod.Post, HttpMethod.Patch));

We can also introduce more specific methods:

// Disables retries for POST, PATCH, and CONNECT.
public static void DisableForNonIdempotentHttpMethods(this HttpRetryStrategyOptions options);

// Disables retries for POST, PATCH, PUT, DELETE, and CONNECT.
public static void DisableForUnsafeHttpMethods(this HttpRetryStrategyOptions option);

And we will introduce similar API for hedging because it also hedges requests regardless of the HTTP method.

What do you think? @davidfowl @martintmk @geeknoid

davidfowl commented 1 month ago

At a minimum let's make a new top-level method where this is the default behavior. That seems like a reasonable compromise.

iliar-turdushev commented 1 month ago

@davidfowl If by saying new top-level method you mean an extension method on IHttpClientBuilder then we consider the following method:

// Disables retries for POST, PATCH, and CONNECT.
public static IHttpStandardResiliencePipelineBuilder AddIdempotentStandardResilienceHandler(this IHttpClientBuilder builder);

The usage will be:

services.AddHttpClient("client").AddIdempotentStandardResilienceHandler();

What do you think about the name AddIdempotentStandardResilienceHandler?

Another name I can think of is AddRestResilienceHandler or AddRestHttpResilienceHandler. REST APIs often rely on HTTP methods, and deciding whether to do or not to do retries for REST APIs based on HTTP method being used is reasonable.

So, we have few proposals: AddIdempotentStandardResilienceHandler, AddRestResilienceHandler or AddRestHttpResilienceHandler. Can we use one of them? Or should we come up with better name? Personally, I like AddRestResilienceHandler.

cc @geeknoid @martintmk

RussKie commented 1 month ago

What about the following pseudocode?

// Existing implementation
public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(this IHttpClientBuilder builder)
    => AddStandardResilienceHandler(builder, enableIdempotency: false);

// Disables retries for POST, PATCH, and CONNECT if enableIdempotency=true
public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(this IHttpClientBuilder builder, bool enableIdempotency)
    => AddStandardResilienceHandler(builder, enableIdempotency, [POST, PATCH, CONNECT]);

// Disables retries for the specified verbs  if enableIdempotency=true
public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(this IHttpClientBuilder builder, bool enableIdempotency, params HttpVerb disableIdempotencyVerbs);

iliar-turdushev commented 1 month ago

What about the following pseudocode?

Looks good to me. I would remove bool enableIdempotency from the third method. I think it is already implied that enableIdempotency = true when we provide a list of HTTP methods that are considered idempotent.

@davidfowl @martintmk @geeknoid Could you, please, share your feedback on the proposed APIs 1 and 2? Can we proceed with one of them or should we think of another API? Thank you.

RussKie commented 1 month ago

Looks like I made a mistake, it should read "disableIdempotencyVerbs"

public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(this IHttpClientBuilder builder, bool enableIdempotency, params HttpVerb disableIdempotencyVerbs);

I fixed my previous suggestion.

iliar-turdushev commented 1 month ago

@RussKie What if we use the following parameter names:

enableIdempotency --> handleNonIdempotentHttpMethods. When handleNonIdempotentHttpMethods = true we get default/current behavior, otherwise - we get idempotent behavior, i.e. don't do retries for POST, PATCH, etc. The word handle relates to the name of the property ShouldHandle which is used by resilience policies to define whether to execute the policy or not (e.g. do retries or not).
disableIdempotencyVerbs --> nonIdempotentHttpMethods.

And we'll introduce methods with those arguments:

// Disables retries for POST, PATCH, and CONNECT if handleNonIdempotentHttpMethods=false
public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(
    this IHttpClientBuilder builder, bool handleNonIdempotentHttpMethods);

// Disables retries for the specified verbs
public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(
    this IHttpClientBuilder builder, params HttpMethod[] nonIdempotentHttpMethods);

I think we don't need the argument bool handleNonIdempotentHttpMethods when we pass a list of non-idempotent HTTP methods.

RussKie commented 1 month ago

// Disables retries for POST, PATCH, and CONNECT if handleNonIdempotentHttpMethods=false
public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(
    this IHttpClientBuilder builder, bool handleNonIdempotentHttpMethods);

// Disables retries for the specified verbs
public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(
    this IHttpClientBuilder builder, params HttpMethod[] nonIdempotentHttpMethods);

I think there's an issue here - if handleNonIdempotentHttpMethods=true (the proposed default), then it's good, no changes in the existing user code required. However, if handleNonIdempotentHttpMethods=false then we need to supply which verbs to exclude, otherwise it doesn't make much sense. This makes handleNonIdempotentHttpMethods only meaningful in 50% of cases.

The second signature is probably more meaningful - if no nonIdempotentHttpMethods supplied - attempt retry for all of those; otherwise, only retry for those not listed.

Which, I think, bring us to the following signatures:

// By default, attempts to retry for all HTTP verbs
public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(this IHttpClientBuilder builder);

// Disables retries for the specified verbs
public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(this IHttpClientBuilder builder, params HttpMethod[] nonIdempotentHttpMethods);

Also, there are other ways to construct a standard resilience handler - what do we do with those? Do we need to add overloads for those too?

public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(this IHttpClientBuilder builder, IConfigurationSection section)
public static IHttpStandardResiliencePipelineBuilder AddStandardResilienceHandler(this IHttpClientBuilder builder, Action<HttpStandardResilienceOptions> configure)

iliar-turdushev commented 1 month ago

However, if handleNonIdempotentHttpMethods=false then we need to supply which verbs to exclude, otherwise it doesn't make much sense. This makes handleNonIdempotentHttpMethods only meaningful in 50% of cases.

The second signature is probably more meaningful - if no nonIdempotentHttpMethods supplied - attempt retry for all of those; otherwise, only retry for those not listed.

The first overload was proposed to have a method that disables retries for POST, PAST, and CONNECT. You're right it is meaningful only when handleNonIdempotentHttpMethods=false, otherwise - users wouldn't use it.

Which, I think, bring us to the following signatures:

Yeah, we can have only one new method with params HttpMethod[] nonIdempotentHttpMethods argument. The only downside that users will every time need to supply a list of HTTP methods. The purpose of the overload with nonIdempotentHttpMethods was to not make users to provide the list of HTTP methods if they wanted to disable retries default list of non-idempotent HTTP methods.

Do we need to add overloads for those too?

Yes, we do. We'll also need to create corresponding overloads for the StandardHedgingHandler.

iliar-turdushev commented 1 month ago

Honestly, the more we discuss introducing new top-level API the more I think that introducing a new API on HttpRetryStrategyOptions would be better. We want to disable retries only for the HttpRetryStrategy, not for the whole StandardResilienceHandler, hence it is more logical to have such an API on HttpRetryStrategyOptions object. Additionally, if we introduce the new API on the HttpRetryStrategyOptions then users will be able to reuse it building their custom resilience handlers, not only when they use StandardResilienceHandler.

RussKie commented 1 month ago

Honestly, the more we discuss introducing new top-level API the more I think that introducing a new API on HttpRetryStrategyOptions would be better. We want to disable retries only for the HttpRetryStrategy, not for the whole StandardResilienceHandler, hence it is more logical to have such an API on HttpRetryStrategyOptions object. Additionally, if we introduce the new API on the HttpRetryStrategyOptions then users will be able to reuse it building their custom resilience handlers, not only when they use StandardResilienceHandler.

Works for me 👍

Thought, I'd shorten the name to:

// API
public static void DisableFor(this HttpRetryStrategyOptions options, params HttpMethod[] httpMethods);

// usage
services
    .AddHttpClient()
    .AddStandardResilienceHandler(options => options.Retry.DisableFor(HttpMethod.Post, HttpMethod.Patch));

I'd start with a single API, and in the future we could add specific overloads/API, if necessary.

martintmk commented 1 month ago

// API
public static void DisableFor(this HttpRetryStrategyOptions options, params HttpMethod[] httpMethods);

// usage
services
    .AddHttpClient()
    .AddStandardResilienceHandler(options => options.Retry.DisableFor(HttpMethod.Post, HttpMethod.Patch));

I think this is the most straightforward proposal yet. Just a side note, the only way to implement this is to take existing delegate and wrap it.

Naive implementation is:

public static class HttpRetryStrategyOptionsExtensions
{
    public static HttpRetryStrategyOptions DisableFor(this  HttpRetryStrategyOptions options, params HttpMethod[] methods)
    {
        var previous = options.ShouldHandle;

        options.ShouldHandle = async args =>
        {
            if (args.Outcome.Result is HttpResponseMessage response && methods.Contains(response.RequestMessage.Method))
            {
                return false;
            }

            return await previous(args);
        };
    }
}

RussKie commented 1 month ago

@geeknoid @davidfowl @SteveSandersonMS any objections? Otherwise, I'm setting this as "api approved".

SteveSandersonMS commented 1 month ago

In my mind the main requirement is that newly-created projects should get safe behavior by default, i.e., the project template should be updated so that new projects don't retry non-idempotent methods. Existing projects can continue unaffected.

Of the API proposals above, I think DisableForUnsafeHttpMethods is the most useful. Having a DisableFor API to configure it on a per-method basis is fine but rarely likely to be useful. That is, having fine-grained control so you can do this for Post but not Put seems hard to match to a real scenario. Making people remember all the dangerous verbs is error-prone (for example the sample code above doesn't account for Delete and Patch, and if it was updated to do so it would look quite clunky). There should be a single method or flag that deals with knowing which HTTP methods are idempotent and which aren't, so that developers aren't forced to work this out.

So in summary I think:

There should be an API for disabling it for all unsafe methods, or perhaps more precisely, for disabling it for all methods except known idempotent ones. For example called DisableForUnsafeHttpMethods or EnableOnlyForSafeHttpMethods.
This should be used in the project template by default
If there's also a way to manually control it on a per-method basis (i.e., DisableFor) that's not harmful, but I'm not sure when it would be used

iliar-turdushev commented 1 month ago

Thank you @SteveSandersonMS.

So, let's proceed with the following API:

// Disables retries for POST, PATCH, PUT, DELETE, and CONNECT.
public static HttpRetryStrategyOptions DisableForUnsafeHttpMethods(this HttpRetryStrategyOptions option);

// Disables retries for the given list of HTTP methods.
public static HttpRetryStrategyOptions DisableFor(this HttpRetryStrategyOptions options, params HttpMethod[] methods);

And we'll also need to introduce similar API for hedging. But let's do that in a separate issue:

// Disables hedging for POST, PATCH, PUT, DELETE, and CONNECT.
public static HttpHedgingStrategyOptions DisableForUnsafeHttpMethods(this HttpHedgingStrategyOptions option);

// Disables hedging for the given list of HTTP methods.
public static HttpHedgingStrategyOptions DisableFor(this HttpHedgingStrategyOptions options, params HttpMethod[] methods);

RussKie commented 1 month ago

Marking as ready for review by the ARB.

terrajobst commented 1 month ago

@RussKie @iliar-turdushev could you put the API proposal into the description that includes namespace and type information? Also, is this actually ready for review? If not, it shouldn't be marked as such.

iliar-turdushev commented 1 month ago

@terrajobst I have added the API proposal. See this comment https://github.com/dotnet/extensions/issues/5248#issue-2373106206. The API was already discussed here in the issue with @RussKie and @SteveSandersonMS. In my opinion it is ready for review, unless we want to discuss it here in the comments with somebody from the API review team.

terrajobst commented 3 weeks ago

Looks good as proposed
The template team needs to sign off too, we believe this needs to be in the template to be useful.
We discussed whether changing the default would be viable but it seems we don't believe that to be possible.
We might want to be make them chainable.

namespace Microsoft.Extensions.Http.Resilience;

public static class HttpRetryStrategyOptionsExtensions
{
    public static void DisableForUnsafeHttpMethods(this HttpRetryStrategyOptions option);
    public static void DisableFor(this HttpRetryStrategyOptions options, params HttpMethod[] methods);
}

iliar-turdushev commented 3 weeks ago

The template team needs to sign off too, we believe this needs to be in the template to be useful.

@SteveSandersonMS How can we follow up on this? If the template team decide not to change the template, then is it worth introducing the proposed API changes?

davidfowl commented 3 weeks ago

We will change the template. Lets get the API in.

RussKie commented 2 weeks ago

Was the review discussion recorded?

iliar-turdushev commented 3 days ago

Was the review discussion recorded?

@RussKie Here is the recording https://www.youtube.com/watch?v=ReIPc3F3Xd0.

dotnet / extensions