Azure / azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
MIT License
716 stars 271 forks source link

Rewind support seems broken #2652

Open kaibocai opened 1 year ago

kaibocai commented 1 year ago

Description

A clear and concise description of what the bug is. Please make an effort to fill in all the sections below; the information will help us investigate your issue.

We are working out the rewind support on durabletask-java, at https://github.com/microsoft/durabletask-java/pull/123. In our end-to-end test, we found an exception thrown out (more details are https://github.com/microsoft/durabletask-java/pull/123#pullrequestreview-1698252961), to me this comes from the extension. Note that this exception is thrown out both using the POST HTTP request or using the client.rewind method.

Already had a quick chat with @jviau, it seems the extension drops the value that needed to be set in the grpc message somewhere. I spent sometime look into the extension but with no luck on this due to lack of knowledge in extension, so open a issue here to track it.

I am open to have a session on this if it is preferred.

cc @davidmrdavid as well.

NOTE: JavaScript issues should be reported here: https://github.com/Azure/azure-functions-durable-js

Expected behavior

A clear and concise description of what you expected to happen.

Actual behavior

A clear and concise description of what happened.

Relevant source code snippets

// insert code snippet here

Known workarounds

Provide a description of any known workarounds you used.

App Details

Screenshots

If applicable, add screenshots to help explain your problem.

If deployed to Azure

We have access to a lot of telemetry that can help with investigations. Please provide as much of the following information as you can to help us investigate!

If you don't want to share your Function App or storage account name GitHub, please at least share the orchestration instance ID. Otherwise it's extremely difficult to look up information.

kaibocai commented 1 year ago

Related update on proto https://github.com/microsoft/durabletask-protobuf/pull/18. Though, still need to find why reason is dropped.

kaibocai commented 1 year ago

With @davidmrdavid 's help, we debug into the extension and I found here https://github.com/Azure/azure-functions-durable-extension/blob/9f750ebc77fda1f52c5042dcdb41e6eb52ee7fdb/src/WebJobs.Extensions.DurableTask/Bindings/OrchestrationTriggerAttributeBindingProvider.cs#L171-L172 theremoteContext.PastEvents already contains a few GenericEvents that have null as value of Data, and that causing the exception in ProtobufUtils.ToHistoryEventProto. I am not quite sure where are those GenericEvents come from.

image

The GenericEvent created by rewind API seems in the remoteContext.newEvents and has the correct data set to rewind reason.

image

Also, I am told the logic for handling HTTP requests for java is totally different from js and python, java is using grpc while js and python is using HTTP. That may explain why I can run rewind successfully using js.

Hopefully the PR here https://github.com/microsoft/durabletask-protobuf/pull/18 will have the issue resolved

cc @cgillum , @davidmrdavid , @jviau