Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.95k stars 442 forks source link

PowerShell worker crashes with intermittent Grpc.Core.RpcException: Status(StatusCode=Unknown, Detail="Stream removed") exception #9031

Open michaelpeng36 opened 1 year ago

michaelpeng36 commented 1 year ago

Investigative information

Please provide the following:

Repro steps

No specific repro steps. Queue items are added by a basic ConsoleApp. The issue occurs intermittently, over the span of days.

Expected behavior

The expected behavior is that gRPC stream between the worker and the host is not suddenly removed, at least without graceful worker shutdown. Kusto logs suggest that the stream is not removed by the worker:

   at Grpc.Core.Internal.ClientResponseStream`2.MoveNext(CancellationToken token)
   at Microsoft.Azure.Functions.PowerShellWorker.Messaging.MessagingStream.MoveNext() in /mnt/vss/_work/1/s/src/Messaging/MessagingStream.cs:line 42
   at Microsoft.Azure.Functions.PowerShellWorker.RequestProcessor.ProcessRequestLoop() in /mnt/vss/_work/1/s/src/RequestProcessor.cs:line 75
   at Microsoft.Azure.Functions.PowerShellWorker.Worker.Main(String[] args) in /mnt/vss/_work/1/s/src/Worker.cs:line 57
   at Microsoft.Azure.Functions.PowerShellWorker.Worker.&lt;Main&gt;(String[] args)</Data></EventData></Event>

Actual behavior

Intermittent worker crashes in the middle of the request-processing loop.

Known workarounds

The issue resolves itself after some time.

Related information

Provide any related information

run.ps1 ```powershell param([string] $QueueItem, $TriggerMetadata) $ErrorActionPreference = "Stop" Write-Host "Succeeded" ``` function.json ``` { "bindings": [ { "name": "QueueItem", "type": "queueTrigger", "direction": "in", "queueName": "ps-zoom-meetings-queue-items", "connection": "AzureWebJobsStorage" } ], "retry": { "strategy": "fixedDelay", "maxRetryCount": 0, "delayInterval": "00:00:10" } } ```
fabiocav commented 1 year ago

@michaelpeng36 assigning this for initial investigation, but curious to know if you have an app you can use to repro this. If so, we can have some additional instrumentation added to see if that helps us identify the root cause.

michaelpeng36 commented 1 year ago

Thanks for the response, @fabiocav. Yes, we have a repro app set up for this. I will share the details privately.

fabiocav commented 1 year ago

@brettsam / @michaelpeng36 have you had a chance to sync on this? I'll move this to sprint 141, but please update/close if there is more information about this issue.

fabiocav commented 1 year ago

Pushing to sprint 143, but @brettsam and @michaelpeng36 are actively looking at this.