Azure / azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
MIT License
711 stars 263 forks source link

PurgeAllInstancesAsync throwing Grpc.Core.RpcException #2702

Open jcageman opened 6 months ago

jcageman commented 6 months ago

Description

    public class HttpTriggerTest
    {
        [Function("HttpTriggerTest")]
        public async Task<HttpResponseData> Run([HttpTrigger(AuthorizationLevel.Function, "get", "post")] HttpRequestData req, [DurableClient] DurableTaskClient client)
        {
            var result = await client.PurgeAllInstancesAsync(new PurgeInstancesFilter(null, DateTime.Today.AddDays(-30),
                new List<OrchestrationRuntimeStatus>
                {
                    OrchestrationRuntimeStatus.Completed,
                    OrchestrationRuntimeStatus.Failed,
                    OrchestrationRuntimeStatus.Terminated
                })
            );

            var response = req.CreateResponse(HttpStatusCode.OK);
            return response;
        }
    }

Exception: Grpc.Core.RpcException: Status(StatusCode="Unknown", Detail="Exception was thrown by handler.") [2023-12-22T08:10:04.660Z] at Microsoft.DurableTask. [2023-12-22T08:10:04.660Z] at Microsoft.DurableTask.Client.Grpc.GrpcDurableTaskClient.PurgeInstancesCoreAsync(PurgeInstancesRequest request, CancellationToken cancellation)

I wouldn't expect this exception. This is happening on a new function app on .NET 8 isolated with the simple http trigger as above with the package versions below. I get this error locally running via azurite

App Details

.NET 8 Isolated Microsoft.Azure.Functions.Worker 1.20.0 Microsoft.Azure.Functions.Worker.Sdk 1.16.4 Microsoft.Azure.Functions.Worker.Extensions.DurableTask 1.1.0

jcageman commented 5 months ago

FYI there is a workaround by changing it to:

            var result = await client.PurgeAllInstancesAsync(new PurgeInstancesFilter(new DateTime(new DateOnly(2000, 1, 1), TimeOnly.MinValue), DateTime.Today.AddDays(-30),
                    new List<OrchestrationRuntimeStatus>
                    {
                        OrchestrationRuntimeStatus.Completed,
                        OrchestrationRuntimeStatus.Failed,
                        OrchestrationRuntimeStatus.Terminated
                    })
                );

I do think there is a bug though, because PurgeInstancesFilter supports null values, but it actually crashes when CreatedFrom is null or DateTime.MinValue for example

cgillum commented 3 months ago

I suspect what's happening here is that the purge call is timing out in the gRPC layer. I think there are a couple ways we can address this:

  1. Increase the request timeout of the gRPC operation
  2. Add some error handling in the SDK to handle these timeouts gracefully.

It should be safe to catch the exception and retry. Each call should be making progress. The thing to be careful about is making sure we're not accidentally retyring some other non-retriable issue.

Fazer01 commented 2 months ago

Linking issue from the durabletask. -dotnet framework: https://github.com/microsoft/durabletask-dotnet/issues/282

JohanKlijn commented 2 months ago

The provide workaround by @jcageman is not working for me. To me it looks like a problem in the DurableTaskClient.

The PurgeAllInstancesAsync(PurgeInstancesFilter filter, CancellationToken cancellation) is calling this.PurgeAllInstancesAsync(new PurgeInstancesFilter(), null, cancellation); Ff you look closely a new filter (new PurgeInstancesFilter()) is created when the call PurgeAllInstancesAsync with 3 paramaters is done.

See https://github.com/microsoft/durabletask-dotnet/blob/main/src/Client/Core/DurableTaskClient.cs#L379C2-L381C88.

Code in DurableTaskClient:

    public virtual Task<PurgeResult> PurgeAllInstancesAsync(PurgeInstancesFilter filter, CancellationToken cancellation)
        => this.PurgeAllInstancesAsync(new PurgeInstancesFilter(), null, cancellation);
jcageman commented 2 months ago

The provide workaround by @jcageman is not working for me. To me it looks like a problem in the DurableTaskClient.

The PurgeAllInstancesAsync(PurgeInstancesFilter filter, CancellationToken cancellation) is calling this.PurgeAllInstancesAsync(new PurgeInstancesFilter(), null, cancellation); Ff you look closely a new filter (new PurgeInstancesFilter()) is created when the call PurgeAllInstancesAsync with 3 paramaters is done.

See https://github.com/microsoft/durabletask-dotnet/blob/main/src/Client/Core/DurableTaskClient.cs#L379C2-L381C88.

Code in DurableTaskClient:

    public virtual Task<PurgeResult> PurgeAllInstancesAsync(PurgeInstancesFilter filter, CancellationToken cancellation)
        => this.PurgeAllInstancesAsync(new PurgeInstancesFilter(), null, cancellation);

the code you are referring to calls another method in the same implementation which throws a not implemented exception. So not a problem, although it's indeed confusing.

https://github.com/microsoft/durabletask-dotnet/blob/963f6c8e1128f79557afed2e95ee07f4429fbec7/src/Client/Grpc/GrpcDurableTaskClient.cs#L338C4-L359C6 is the actual implementation.