Azure / azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
MIT License
713 stars 270 forks source link

In isolated mode, RetryContext.LastFailure.IsCausedBy is not useful if activity function is async #2762

Open stevendarby opened 6 months ago

stevendarby commented 6 months ago

Discussed in https://github.com/Azure/azure-functions-durable-extension/discussions/2614

Originally posted by **buckprime** October 3, 2023 It seems like we need more information about the last error in retry handlers (meaning RetryHandler/AsyncRetryHandler, that you can put into a TaskOptions to specify custom handling of retries). The retry handler gets called with a RetryContext, which has a call LastFailure.IsCausedBy() that is supposed to let you check what exception caused the failure you're retrying. But if the activity function that failed was async, the exception that the RetryContext sees is always AggregateException rather than the actual exception that was thrown. The context only contains the name of the type of the exception, so there's no way to determine what the actual exception was. I think that the context either needs to know what the original exception was, or provide more information about the exception so that, for instance, if it's an AggregateException we can tell what exceptions it contained. Otherwise IsCausedBy isn't useful if your activity function is async.
MarioMajcicaAtABNAMRO commented 3 weeks ago

Indeed, we are having the same issue here. We are moving from inProc to Isolated and aside from having the issue with TaskOptions itself (https://github.com/Azure/azure-functions-durable-extension/issues/2908) we do see that the exception that triggered a retry is not exposed in the handler.

In past we could do something like if (ex.InnerException is NotFinishedStatusException statusException && statusException.Operation.Completed)

Now, we can only check the exception type via retryContext.LastFailure.IsCausedBy<NotFinishedStatusException>()

It would be handy if the exception itself is exposed.

thomaseyde commented 3 weeks ago

I ran into the same issue today. I would think that activity functions would mostly be async, otherwise there is very little useful work they can do.

I have looked into how IsCausedBy works:

The owner record TaskFailureDetails is recursive, it has a property InnerFailure. There is a placeholder to keep inner exception details, but for some reason it is not used. The debugger also tells me this value is null, so I cannot create my own workaround.

A higher priority than P3 is desired.