Azure / azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
MIT License
711 stars 263 forks source link

Function Apps using Durable Task in Isolated worker: Inner exception being lost #2697

Closed wwalkley closed 3 months ago

wwalkley commented 6 months ago

Description

After migrating to the Isolated worker process, we've encountered an issue with handling exceptions. The FunctionFailedException is no longer accessible in the latest worker packages. Despite extensive searching, there's no clear documentation or online resources addressing this issue.

The documentation states that exceptions thrown in activity functions should be marshaled back to the orchestrator function as FunctionFailedException. However, in our case, they are propagating up as TaskFailedException's. This behavior differs from the expected outcome, as outlined here: #https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-error-handling?tabs=csharp-isolated#errors-in-activity-functions

Any exception that is thrown in an activity function is marshaled back to the orchestrator function and thrown as a FunctionFailedException. You can write error handling and compensation code that suits your needs in the orchestrator function.

Expected behavior

The exception thrown contains the root exception and we have access to the FunctionFailedException

Actual behavior

The exception thrown contains only the message of the root exception and is a TaskFailedException

Relevant source code snippets

Snippet in debug mode:

image

Stacktrace:

Microsoft.DurableTask.TaskFailedException: Task 'GetCustomerIntegrationStatusActivity' (#6) failed with an unhandled exception: One or more errors occurred. (HealthCheckFailed Xero API health check failed. This could be transient on our side, transient on their side, or a more serious problem.)
 ---> DurableTask.Core.Exceptions.TaskFailedException: Exception of type 'DurableTask.Core.Exceptions.TaskFailedException' was thrown.
   at DurableTask.Core.TaskOrchestrationContext.ScheduleTaskInternal(String name, String version, String taskList, Type resultType, Object[] parameters) in /_/src/DurableTask.Core/TaskOrchestrationContext.cs:line 118
   at DurableTask.Core.TaskOrchestrationContext.ScheduleTaskToWorker[TResult](String name, String version, String taskList, Object[] parameters) in /_/src/DurableTask.Core/TaskOrchestrationContext.cs:line 89
   at DurableTask.Core.TaskOrchestrationContext.ScheduleTask[TResult](String name, String version, Object[] parameters) in /_/src/DurableTask.Core/TaskOrchestrationContext.cs:line 81
   at DurableTask.Core.RetryInterceptor`1.Invoke() in /_/src/DurableTask.Core/RetryInterceptor.cs:line 61
   at DurableTask.Core.RetryInterceptor`1.Invoke() in /_/src/DurableTask.Core/RetryInterceptor.cs:line 96
   at Microsoft.DurableTask.Worker.Shims.TaskOrchestrationContextWrapper.CallActivityAsync[T](TaskName name, Object input, TaskOptions options)
   --- End of inner exception stack trace ---
   at Microsoft.DurableTask.Worker.Shims.TaskOrchestrationContextWrapper.CallActivityAsync[T](TaskName name, Object input, TaskOptions options)
   at Rifleman.Fintegration.Functions.Helpers.SyncExecutors.FullSyncExecutor.GetCustomerSyncStatus() in C:\Work\Rifleman.Fintegration\Rifleman.Fintegration\Rifleman.Fintegration.Functions\Helpers\SyncExecutors\FullSyncExecutor.cs:line 95
   at Rifleman.Fintegration.Functions.Helpers.SyncExecutors.FullSyncExecutor.GetCustomerSyncStatus() in C:\Work\Rifleman.Fintegration\Rifleman.Fintegration\Rifleman.Fintegration.Functions\Helpers\SyncExecutors\FullSyncExecutor.cs:line 129
   at Rifleman.Fintegration.Functions.Helpers.SyncExecutors.SyncExecutorBase.PerformPreSyncCheck() in C:\Work\Rifleman.Fintegration\Rifleman.Fintegration\Rifleman.Fintegration.Functions\Helpers\SyncExecutors\SyncExecutorBase.cs:line 79

Known workarounds

To get around this, we caught TaskFailedException's in our Orchestrators and used string comparisons against the ex.FailureDetails.ErrorMessage to handle certain expectations :cry:

App Details

<ItemGroup>
        <PackageReference Include="Microsoft.Azure.Core.NewtonsoftJson" Version="1.0.0"/>
        <PackageReference Include="Microsoft.Azure.Cosmos.Table" Version="1.0.8"/>
        <PackageReference Include="Microsoft.Azure.Functions.Worker" Version="1.19.0"/>
        <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.DurableTask" Version="1.1.0"/>
        <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Http" Version="3.1.0"/>
        <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Http.AspNetCore" Version="1.0.0"/>
        <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.ServiceBus" Version="5.14.1"/>
        <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Timer" Version="4.2.0"/>
        <PackageReference Include="Microsoft.Azure.Functions.Worker.Sdk" Version="1.15.1"/>
        <PackageReference Include="Microsoft.Azure.Functions.Extensions" Version="1.1.0"/>
        <PackageReference Include="Microsoft.Azure.Storage.Blob" Version="11.2.3"/>
        <PackageReference Include="Microsoft.Azure.Storage.Queue" Version="11.2.3"/>
        <PackageReference Include="Microsoft.DurableTask.Generators" Version="1.0.0-preview.1"/>
        <PackageReference Include="Microsoft.Extensions.Http" Version="6.0.0"/>
</ItemGroup>

Any guidance would be appreciated.

imeya commented 6 months ago

@cgillum could you please take a look at this issue? Thanks.

cgillum commented 6 months ago

Hi @imeya. As far as I can tell, this is by design, and I apologize that this was not documented clearly. TaskFailedException is the expected exception type which replaces FunctionFailedException from the .NET in-proc model.

Details about the root exception are in the TaskFailedException.FailureDetails property. We are no longer serializing the original activity exception because we found this to be completely unreliable since many exception types (including some that are part of the framework) are not serializeable. Security is also somewhat of a concern since we don't have direct control over or visibility into what exception types are being serialized or deserialized.

Can you share a code snippet from your previous orchestration logic showing how you were handling exceptions previously in the .NET in-proc model?

wwalkley commented 6 months ago

@cgillum @imeya Thanks for getting back to us, we appreciate your cooperation as we figure out how to move forward.

Following up on the documentation discrepancy regarding TaskFailedException. While your confirmation clarifies its expected behavior, the existing documentation inadvertently led us astray. It would be great to get the documentation updated.

Our code changes. Left is in-proc, Right is isolated worker image

Relying on inner/base exceptions (highlighted in red) proved ineffective when moving to the Isolated worker model. This necessitated adding a catch block for TaskFailedException. However, we encountered a challenge: the FailureDetails property didn't provide the granular information needed for type-based exception handling. We've resorted to string comparison from the error message as a temporary workaround (highlighted in yellow). Attached are screenshots showcasing the revised error handling logic and the structure of FailureDetails. We'd love your input on how we can improve this further.

Here is what the failure details look like: image

Full details:

ErrorMessage: One or more errors occurred. (HealthCheckFailed Xero API health check failed. This could be transient on our side, transient on their side, or a more serious problem.)
ErrorType: (unknown)
InnerFailure: null
ExceptionType: null
StackTrace:
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Microsoft.Azure.Functions.Worker.Invocation.DefaultFunctionInvoker`2.<>c.<InvokeAsync>b__6_0(Task`1 t) in D:\a\_work\1\s\src\DotNetWorker.Core\Invocation\DefaultFunctionInvoker.cs:line 32
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at Microsoft.Azure.Functions.Worker.Invocation.DefaultFunctionExecutor.ExecuteAsync(FunctionContext context) in D:\a\_work\1\s\src\DotNetWorker.Core\Invocation\DefaultFunctionExecutor.cs:line 45
   at Microsoft.Azure.Functions.Worker.OutputBindings.OutputBindingsMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a\_work\1\s\src\DotNetWorker.Core\OutputBindings\OutputBindingsMiddleware.cs:line 13
   at Microsoft.Azure.Functions.Worker.Extensions.Http.AspNetCore.FunctionsHttpProxyingMiddleware.Invoke(FunctionContext context, FunctionExecutionDelegate next) in D:\a\_work\1\s\extensions\Worker.Extensions.Http.AspNetCore\src\FunctionsMiddleware\FunctionsHttpProxyingMiddleware.cs:line 34
   at Microsoft.Azure.Functions.Worker.FunctionsApplication.InvokeFunctionAsync(FunctionContext context) in D:\a\_work\1\s\src\DotNetWorker.Core\FunctionsApplication.cs:line 77
   at Microsoft.Azure.Functions.Worker.Handlers.InvocationHandler.InvokeAsync(InvocationRequest request) in D:\a\_work\1\s\src\DotNetWorker.Grpc\Handlers\InvocationHandler.cs:line 88

Custom exception from example:

[Serializable]
public class HealthCheckException : Exception
{
    public HealthCheckException( ) { }
    public HealthCheckException( string message ) : base( $"{Constants.ErrorMessage.HealthCheckFailed} {message}" ) { }
    public HealthCheckException( string message, Exception innerException ) : base( $"{Constants.ErrorMessage.HealthCheckFailed} {message}", innerException ) { }
    protected HealthCheckException( SerializationInfo info, StreamingContext context ) : base( info, context ) { }
}
cgillum commented 6 months ago

@wwalkley thanks for the detailed information. Try the following as an alternative way to do the error handling, which will more closely resemble what you had before:

try
{
    // ...
}
catch (TaskFailedException ex) when (ex.FailureDetails.IsCausedBy<ExternalAccessRevokedException>())
{
    // ...
}
catch (TaskFailedException ex) when (ex.FailureDetails.IsCausedBy<HealthCheckException>())
{
    // ...
}
wwalkley commented 6 months ago

Nice method! Unfortunately it returned false due to ExceptionType being null and ErrorType being unknown.

image

Any ideas?

jviau commented 6 months ago

I believe this is an issue of us not capturing inner exceptions in failure details. We should move this to microsoft/durabletask-dotnet as the issue lies there.

wwalkley commented 6 months ago

@jviau are the microsoft/durabletask-dotnet team across this issue?

imeya commented 6 months ago

@jviau thanks for help check the issue. how do we move, or you can help move this to microsoft/durabletask-dotnet? Thanks very much.

lilyjma commented 6 months ago

@wwalkley @imeya - looks like this issue can't be moved because this repo is in a different organization (Azure) than the other one (microsoft). But, we're the team that's maintaining both repos so this issue is under our radar.

imeya commented 6 months ago

thanks @lilyjma for looking into the issue. any ETA for this issue? Thanks.

EmilDamsbo commented 5 months ago

The documentation page here should also mention that TaskFailedException is the replacement: https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-error-handling?tabs=csharp-isolated#errors-in-activity-functions

This was the only place I could search up that mentioned the change.

wwalkley commented 5 months ago

@lilyjma @imeya . Appreciate the update! Glad the team's looking into this.

yatiac commented 4 months ago

@jviau @lilyjma do you have any ETA on this one? or an alternative/work around? Thanks in advance

miniGweek commented 3 months ago

Looking for a solution or workaround for this too.

cgillum commented 3 months ago

I created a local repro and can confirm that the problem is as described here. The issue seems to be that the code for loading the exception type only works for certain built-in exception types (like System.ApplicationException, which is what was tested) but not for things like custom exception types.

I will look into creating a fix for this.

As a workaround, you can instead do something like this:

try
{
    // ...
}
catch (TaskFailedException ex) when (ex.FailureDetails.ErrorType == typeof(ExternalAccessRevokedException).FullName)
{
    // ...
}
catch (TaskFailedException ex) when (ex.FailureDetails.ErrorType == typeof(HealthCheckException).FullName)
{
    // ...
}
cgillum commented 3 months ago

This will be fixed in the next release of the .NET Isolated nuget package (Microsoft.Azure.Functions.Worker.Extensions.DurableTask).

milkyware commented 2 months ago

Hi, one thing I've encountered is that with the EnableUserCodeException worker option enabled, the ErrorType is unknown

image

Is the updated package expected to fix this scenario as well, or is it a separate issue?

This is using the 1.2.0-rc.1 version of Microsoft.Azure.Functions.Worker.Extensions.DurableTask

wwalkley commented 2 months ago

I apologize for bringing this up again, but we recently upgraded the DurableTask package to version 1.1.2. Unfortunately, the issue we were encountering hasn't been resolved.

The error message we're receiving is still categorized as unknown which prevents us from understanding the root cause of the failure. Furthermore, the inner failure is null.

What we are seeing: image

Due to the unknown error type, we're unable to leverage the recommended, more specific error handling strategies from this thread. Instead, we're forced to rely on a string comparison against the ErrorMessage.

These errors aren't caught and instead it falls through to the general Catch at the bottom: image

Any ideas or suggestions would be greatly appreciated, thanks.