Azure / azure-functions-powershell-worker

PowerShell language worker for Azure Functions.
MIT License
206 stars 54 forks source link

Wait-DurableTask broken in Powershell 7.2 functions #851

Open ruankrTs opened 2 years ago

ruankrTs commented 2 years ago

It seems there is inconsistency and breakage with external event handling when using Powershell 7.2 in Azure functions.

HttpTrigger

using namespace System.Net

param($Request, $TriggerMetadata)

$inputs = @{
    Name = "$($Request.Query.Name)"
}

$FunctionName = $Request.Params.FunctionName
$InstanceId = Start-DurableOrchestration -FunctionName $FunctionName -Input $inputs 
Write-Host "Started orchestration with ID = '$InstanceId'"

$Response = New-DurableOrchestrationCheckStatusResponse -Request $Request -InstanceId $InstanceId
Push-OutputBinding -Name Response -Value $Response

Orchestration function

using namespace System.Net
param($Context)

$output = @()
$gate1 = Start-DurableExternalEventListener -EventName "Paris" -NoWait -verbose
$gate2 = Start-DurableExternalEventListener -EventName "London" -NoWait -verbose

$output = Invoke-DurableActivity -FunctionName 'Hello' -Input $Context.Input

$endResults = Wait-DurableTask -Task @($gate1, $gate2)

$finaloutput = Invoke-ActivityFunction -FunctionName 'Bye' -Input $output

$finaloutput

Hello (Activity)

using namespace System.Net
param($name, $TriggerMetadata)

$InstanceId = $TriggerMetadata.InstanceId

Write-Host "Hello $name"

Send-DurableExternalEvent -InstanceId $InstanceId -EventName "London" -verbose
Send-DurableExternalEvent -InstanceId $InstanceId -EventName "Paris" -verbose

$name

Bye (Activity)

using namespace System.Net
param($name)

Write-Host "Bye $name"

$name

Using the above code when I do:

"FUNCTIONS_WORKER_RUNTIME_VERSION": "~7"

It works and will run Bye and produce the correct output. However, when I change it to:

"FUNCTIONS_WORKER_RUNTIME_VERSION": "7.2"

It stays in the running state and does not run Bye. Sending the 2 events manually again, it throws an exception below:

 Orchestration completed with a 'Failed' status and 314 bytes of output. Details: Unhandled exception while executing orchestration: DurableTask.Core.Exceptions.NonDeterministicOrchestrationException: Non-Deterministic workflow detected: A previous execution of this orchestration scheduled an activity task with sequence ID 1 and name 'Bye' (version ''), but the current replay execution hasn't (yet?) scheduled this task. Was a change made to the orchestrator code after this instance had already started running?
[2022-08-26T06:44:07.415Z]    at DurableTask.Core.TaskOrchestrationContext.HandleTaskScheduledEvent(TaskScheduledEvent scheduledEvent) in /_/src/DurableTask.Core/TaskOrchestrationContext.cs:line 271
[2022-08-26T06:44:07.415Z]    at DurableTask.Core.TaskOrchestrationExecutor.ProcessEvent(HistoryEvent historyEvent) in /_/src/DurableTask.Core/TaskOrchestrationExecutor.cs:line 189
[2022-08-26T06:44:07.416Z]    at DurableTask.Core.TaskOrchestrationExecutor.<ExecuteCore>g__ProcessEvents|11_0(IEnumerable`1 events) in /_/src/DurableTask.Core/TaskOrchestrationExecutor.cs:line 114
[2022-08-26T06:44:07.416Z]    at DurableTask.Core.TaskOrchestrationExecutor.ExecuteCore(IEnumerable`1 pastEvents, IEnumerable`1 newEvents) in /_/src/DurableTask.Core/TaskOrchestrationExecutor.cs:line 122
[2022-08-26T06:44:07.417Z]         at DurableTask.Core.TaskOrchestrationContext.HandleTaskScheduledEvent(TaskScheduledEvent scheduledEvent) in /_/src/DurableTask.Core/TaskOrchestrationContext.cs:line 271
[2022-08-26T06:44:07.417Z]    at DurableTask.Core.TaskOrchestrationExecutor.ProcessEvent(HistoryEvent historyEvent) in /_/src/DurableTask.Core/TaskOrchestrationExecutor.cs:line 189
[2022-08-26T06:44:07.420Z]    at DurableTask.Core.TaskOrchestrationExecutor.<ExecuteCore>g__ProcessEvents|11_0(IEnumerable`1 events) in /_/src/DurableTask.Core/TaskOrchestrationExecutor.cs:line 114
[2022-08-26T06:44:07.420Z]    at DurableTask.Core.TaskOrchestrationExecutor.ExecuteCore(IEnumerable`1 pastEvents, IEnumerable`1 newEvents) in /_/src/DurableTask.Core/TaskOrchestrationExecutor.cs:line 122

If I make the orchestrator slightly complicated:

Orchestration function

using namespace System.Net
param($Context)

$output = @()
$gate1 = Start-DurableExternalEventListener -EventName "Paris" -NoWait -verbose
$gate2 = Start-DurableExternalEventListener -EventName "London" -NoWait -verbose

$output = Invoke-DurableActivity -FunctionName 'Hello' -Input $Context.Input
$output1 = Invoke-DurableActivity -FunctionName 'Hello1' -Input $output

$endResults = Wait-DurableTask -Task @($gate1, $gate2)

$finaloutput = Invoke-ActivityFunction -FunctionName 'Bye' -Input $output1

$finaloutput

Hello1 (Activity)

using namespace System.Net
param($name)

Write-Host "Hello again $name!!"

$name

Then it again does not complete unless I resend the events, but this time I sometimes have to send the events multiple times before it fails again (using ~7 still works though). The more complex the orchestration the more inconsistent it seems to get.

Core Tools Version: 4.0.4736 Commit hash: N/A (64-bit) Function Runtime Version: 4.8.1.18957

When using ~7 it runs PowerShell 7.0.11 When using 7.2 its running PowerShell 7.2.4

Am I doing something wrong? Anyone else have success running external events with PowerShell 7.2 when using complex orchestrations?

davidmrdavid commented 2 years ago

Thanks for the heads up @ruankr. I also seem to be able to repro this with 7.2 but not with ~7.

I'll keep this thread updated on what happened - I'm putting this as a top priority item.

davidmrdavid commented 2 years ago

Hi @ruankr - thankfully, the error and fix was easy to find. I've opened a WIP PR with the fix here: https://github.com/Azure/azure-functions-powershell-worker/pull/857

I'll let the CI run overnight, add a test for this tomorrow, and shortly after we ought to be able to merge and fix. Will keep you posted.

ruankrTs commented 2 years ago

@davidmrdavid

I noticed there's been a merge to revert the breaking change, but unsure as to how to target this change? At the moment I am desperately waiting for this to be resolved as need it for my functions...

Also, as a side note... I deployed my test code above to an app in Azure and messages are not even coming in at all.. Looking at the logging it appears this is due to authentication as can be seen from below error. Is authentication not handled when using external messaging? I can't see anything in the docs implying I have to authenticate in order to send messages using Send-DurableExternalEvent, especially when the messages are coming from activities within the same orchestration ? Same code simply works locally...

2022-09-02T10:07:32Z   [Error]   ERROR: Response status code does not indicate success: 401 (Unauthorized).

Exception             : 
    Type       : Microsoft.PowerShell.Commands.HttpResponseException
    Response   : StatusCode: 401, ReasonPhrase: 'Unauthorized', Version: 1.1, Content: System.Net.Http.HttpConnectionResponseContent, Headers:
                 {
                 Date: Fri, 02 Sep 2022 10:07:31 GMT
                 Server: Kestrel
                 WWW-Authenticate: Bearer
                 Request-Context: appId=cid-v1:a36b4303-729d-450a-b813-0b6e82adf910
                 Content-Length: 0
                 }
    TargetSite : 
        Name          : ThrowTerminatingError
        DeclaringType : System.Management.Automation.MshCommandRuntime, System.Management.Automation, Version=7.2.4.500, Culture=neutral, PublicKeyToken=31bf3856ad364e35
        MemberType    : Method
        Module        : System.Management.Automation.dll
    Message    : Response status code does not indicate success: 401 (Unauthorized).
    Source     : System.Management.Automation
    HResult    : -2146233088
    StackTrace : 
   at System.Management.Automation.MshCommandRuntime.ThrowTerminatingError(ErrorRecord errorRecord)
TargetObject          : Method: POST, RequestUri: 'https://ruaxxxxxxxxxxxxxxx.azurewebsites.net/runtime/webhooks/durabletask/instances/ec869d2b-3f24-499e-98c2-0a9e339ebe7f/raiseEvent/London', Version: 1.1, Content: System.Net.Http.ByteArrayContent, Headers:
                        {
                        User-Agent: Mozilla/5.0
                        User-Agent: (Linux; Linux 5.10.102.2-microsoft-standard #1 SMP Mon Mar 7 17:36:34 UTC 2022; )
                        User-Agent: PowerShell/7.2.4
                        Content-Length: 4
                        Content-Type: application/json
                        }
CategoryInfo          : InvalidOperation: (Method: POST, Reque… application/json
                        }:HttpRequestMessage) [Invoke-RestMethod], HttpResponseException
FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand
InvocationInfo        : 
    MyCommand        : Invoke-RestMethod
    ScriptLineNumber : 289
    OffsetInLine     : 13
    HistoryId        : 1
    ScriptName       : /azure-functions-host/workers/powershell/7.2/Modules/Microsoft.Azure.Functions.PowerShellWorker/Microsoft.Azure.Functions.PowerShellWorker.psm1
    Line             : $null = Invoke-RestMethod -Uri $RequestUrl -Method 'POST' -ContentType 'application/json' -Body $Body

    PositionMessage  : At /azure-functions-host/workers/powershell/7.2/Modules/Microsoft.Azure.Functions.PowerShellWorker/Microsoft.Azure.Functions.PowerShellWorker.psm1:289 char:13
                       +     $null = Invoke-RestMethod -Uri $RequestUrl -Method 'POST' -Conten …
                       +             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    PSScriptRoot     : /azure-functions-host/workers/powershell/7.2/Modules/Microsoft.Azure.Functions.PowerShellWorker
    PSCommandPath    : /azure-functions-host/workers/powershell/7.2/Modules/Microsoft.Azure.Functions.PowerShellWorker/Microsoft.Azure.Functions.PowerShellWorker.psm1
    InvocationName   : Invoke-RestMethod
    CommandOrigin    : Internal
ScriptStackTrace      : at Send-DurableExternalEvent, /azure-functions-host/workers/powershell/7.2/Modules/Microsoft.Azure.Functions.PowerShellWorker/Microsoft.Azure.Functions.PowerShellWorker.psm1: line 289
                        at <ScriptBlock>, /home/site/wwwroot/Hello/run.ps1: line 7

Any insight on either above questions would be much appreciated.

davidmrdavid commented 2 years ago

Hi @ruankr,

Sorry for the inconvenience, we'll find your a solution.

If you're referring to these PRs ( https://github.com/Azure/azure-functions-host/pull/8694 and https://github.com/Azure/azure-functions-host/pull/8693) those are for ensuring this regression does not manifest in PowerShell 7 (which had not released yet), so that users continue to have that workaround.

For PowerShell 7.2 - we have the fix, but we'll need to catch a different release date. I'm working on setting that up. I should have an update as soon as Tuesday next week.

So let me outline a few possible workarounds.

The first one is to temporarily use PS 7, if that's a possibility for your app. Please let me know if that's the case.

The second option should be to revert back your Azure Functions runtime (also called "Host") version to 4.8.0, as that version should not have this regression. You can use the guidance here: https://docs.microsoft.com/en-us/azure/azure-functions/set-runtime-version?tabs=portal#automatic-and-manual-version-updates to revert back your Host version to 4.8.0

I'm going to be testing the second guidance myself shortly. I'll report back on the details asap.

Regarding your second question, on the authorization bits: can you please provide me with a .zip'ed repro? Thanks.

I'll also be writing a regression guidance post on this shortly. I'll link it to this thread.

davidmrdavid commented 2 years ago

I'm still experimenting with PowerShell 7.2 workarounds, but I can confirm that PowerShell 7.0 seems to work as expected. This is rather strange. Will continue this thread posted.

@ruankr: can you please confirm if PowerShell 7.0 is a viable temporary workaround for you?

ruankrTs commented 2 years ago

@davidmrdavid Unfortunately I cannot use Powershell 7 due to some dependencies only available in 7.2. I'm looking at using storage tables as a temporary workaround in the mean time however it's getting messy so ideally would like to stick to external events.

My test code I am using to play around with is here: https://github.com/ruankr/azure-durable-funcevents

I've worked in a workaround in Hello3 for the messages in azure using the function key which seems to work, however its not clean as have a dependency on the key being either in keyvault or added to the function itself (which is problematic using terraform). I can work around this though, just thought it would not require authentication being "internal" calls.

davidmrdavid commented 2 years ago

Hi @ruankr:

Does your authentication workaround for Hello3 also resolve the original issue of the non-determinism exception? From my analysis so far, I am fairly certain that the core faulty component is non-deterministic loading of external events data in 7.2. As a result, even with the authentication solved, I am not certain that external events will work properly.

The external storage tables sound promising, but I'd need more details to provide guidance.

Still pursuing workarounds, will keep you posted.

davidmrdavid commented 2 years ago

@ruankr: it would help me to understand when exactly this started failing for you. If you had an app that was working before, and then it started showing this error - can you please provide me the earliest timestamp of this issue in UTC?

ruankrTs commented 2 years ago

@davidmrdavid The authentication workaround works but the external events still do not get picked up.

It's a new issue as I've only recently had the need to report back from another long running instance and was looking what best to do, external events seems to fit for what I need, if it worked...

davidmrdavid commented 2 years ago

Understood, that fits what I expected would happen as well. The fact that this is a new requirement also explains why I'm having some trouble finding a recent Host version without this bug for 7.2.

My recommendation at this point would be to split your workflow into separate orchestrators, split by what would have been the external event call. Then, you can have your long running process simply fire an event when completed (an HTTP request, writing to a queue, writing to storage, or just any other Functions Trigger) and to use that event to fire the next orchestrator in the sequence. It's rather manual, but I think this would work as a workaround.

I'll continue looking for alternatives and pushing for a resolution of the issue on Azure in the meantime.

ruankrTs commented 2 years ago

Yep, that's what I am doing already by means of storage queues. It is messy though as I have to use custom powershell to add/remove messages etc but its a workaround for now.

Appreciate your help with the events though, will be nice to get a solution at some point ;)

davidmrdavid commented 1 year ago

Hey @ruankr:

I just tested this scenario in the new PowerShell Durable Functions SDK announced here and it seems to be working out of the box. Just wanted to notify you in case you wanted to give it a try, but please note it's still in a public preview stage. Thanks!

lilyjma commented 8 months ago

Hi @ruankrTs - thanks for dropping the issue! I'm a PM working on Durable Functions and would love to learn about your experience using DF PowerShell. If your time allows, please grab a meeting with us here. Understanding the problems you're trying to solve will help us target future development. Thanks!