Azure / azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
MIT License
714 stars 270 forks source link

Stuck in running status #2311

Closed CryptoTradee closed 1 year ago

CryptoTradee commented 1 year ago

I wanted to write an eternal function that ran every 8 hours. Unfortunately, saw that could only trigger it with a timer function (hopefully I'm correct on that).

I wrote a timer function that trigger my eternal function unfortunately my singleton pattern didn't work and I ended up having lots of eternal functions running by mitake.

I updated the code afterwards and now those eternal functions have failed however the run time status is still running. Am I still being billed for this? I'd also like to tidy it up just from a maintenance perspective

CryptoTradee commented 1 year ago

image

CryptoTradee commented 1 year ago

I've also tried deleting the history and queue from my storage - no change. Released a code version that had the [FunctionName] tag commented out but when I put the [FunctionName] back in, all the history comes back again.

cgillum commented 1 year ago

I updated the code afterwards and now those eternal functions have failed however the run time status is still running.

This is probably because the code changes you made weren't compatible with the existing history in Azure Storage for the old version of those singletons. More info on orchestration versioning can be found here.

Am I still being billed for this?

No. You are only billed for when functions are actively executing code. A "Running" status simply means that the orchestration has started but hasn't yet completed. Even an orchestrator function that uses a Durable timer to sleep for 90 days will still remain in the "Running" status.

I've also tried deleting the history and queue from my storage - no change.

The "Running" status values from your screenshot are based on historical log data, so those won't change unless you create a new instance with a non-Running status or purge your Application Insights data. FWIW, the current "truth" is in the Instances table in Azure Storage. You can use something like the Durable Functions Monitor if you want a more user-friendly, non-historical view of your orchestrations.

Released a code version that had the [FunctionName] tag commented out but when I put the [FunctionName] back in, all the history comes back again.

This can be expected for eternal orchestrations - removing the [FunctionName] will invalidate the orchestrator function, but there will still be messages in your queues that will repeatedly try to activate a function with that name. This is why as soon as you put [FunctionName] back in, the history comes back. You'd have to delete the queues to ensure the history doesn't come back.

CryptoTradee commented 1 year ago

Thanks for your prompt reply.

The "Running" status values from your screenshot are based on historical log data, so those won't change unless you create a new instance with a non-Running status or purge your Application Insights data. FWIW, the current "truth" is in the Instances table in Azure Storage. You can use something like the Durable Functions Monitor if you want a more user-friendly, non-historical view of your orchestrations.

I've deleted the instance table (and history) but all the orchestrations are still there and stuck Running. It didn't make any difference. Here are the tables I deleted (while the function app was stopped). Obviously created again now. image

I've installed Azure Function Monitor as a nuget package to my AF project and got auth working. However, I'm now getting a server error 500. I raised issue https://github.com/microsoft/DurableFunctionsMonitor/issues/70

Thanks very much

davidmrdavid commented 1 year ago

Hi @CryptoTradee: Apologies for the late response.

Is this issue still ongoing? If so, do you have a minimal repro you could share?

It would also help us if you could provide screenshots of the data in the Instances table before and after purging your storage so we can better understand what you're seeing. Thanks!

ghost commented 1 year ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

CryptoTradee commented 1 year ago

Screenshot 2022-12-11 164036 Yes, the orchestration is still stuck in running. It doesn't show as running within DFM however I feel the native Azure portal shows better info than DFM however (other than this issue).

At the moment, my orchestration is working perfectly although I can't say I have a good idea of what is happening in the background other than talking to the Developers about it. Can close the ticket if you like.

davidmrdavid commented 1 year ago

@CryptoTradee: just for clarity, is that "Running" orchestrator your eternal orchestrator that is otherwise working correctly?

CryptoTradee commented 1 year ago

Yes, that's corect. Not a great instance id there. Here is the header for the function image

davidmrdavid commented 1 year ago

Thanks for clarifying @CryptoTradee. In that case, I would expect its state to show as "Running" as it is an eternal orchestrator, so it will never enter the "Completed" state. I suspect this is the intended behavior.

As mentioned earlier in this thread, the fact that it is in the "Running" state does not mean that it is always doing work, it just means that it may still perform more work. Does that make sense or am I missing something about your use case?

ltouro commented 1 year ago

I highly recommend you to read this article and embrace everything that is there to have a smooth experience with DF:

Regarding your singleton pattern, you just need 2 things: A Timer that will check that your eternal workflow is in "Running State" (or otherwise restart it) and a regular orchestration that is always (re-)started with the same instance id and ends [with a ContinueAsNew ](https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.webjobs.extensions.durabletask.idurableorchestrationcontext.continueasnew?view=azure-dotnet#microsoft-azure-webjobs-extensions-durabletask-idurableorchestrationcontext-continueasnew(system-object-system-boolean) call.

CryptoTradee commented 1 year ago

I have that pattern now and its working correctly. However, the first comment was mentioning, that this "running" orchestration was from a buggy version of the code where the timer had a bug and kept starting new versions of the eternal function. As a result, I had many eternal functions all started (quite bad).

I then released an updated version in which I did the following:

The new eternal function started running as a singleton (and was running state which was fine) however the old eternal instance stayed hanging around as "running" state in the portal even though it had errored (see picture on first comment)

This seems to have now been updated and that old version has gone away now. I think there is something to improve on the portal and you are welcome to look at my account if you like however you can also close the issue if you like.

davidmrdavid commented 1 year ago

I see, thanks for clarifying. Since it looks like the portal issue has gone away for now, and we're already actively working to improve the portal monitoring experience, I'll be closing this issue for now. Thanks!