Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.94k stars 441 forks source link

Intermittent TaskCanceledExceptions when F# function calls EventHubs GetRuntimeInformationAsync #5924

Open gzuber opened 4 years ago

gzuber commented 4 years ago

This originated with a CRI and is pretty easy to reproduce. The set up is to have an F# HttpTrigger function that uses keyvault to store an eventhubs connectionstring in the AppSettings and calls GetRuntimeInformationAsync() with an EventHubClient. The function endpoint is then called every 5 minutes by an Azure Availability Test. Often, maybe 50 - 100 time per day, it fails with this stack trace:

System.Threading.Tasks.TaskCanceledException : A task was canceled.
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at async Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker`2.InvokeAsync[TReflected,TReturnValue](Object instance,Object[] arguments) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionInvoker.cs : 52
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.InvokeAsync(IFunctionInvoker invoker,ParameterHelper parameterHelper,CancellationTokenSource timeoutTokenSource,CancellationTokenSource functionCancellationTokenSource,Boolean throwOnTimeout,TimeSpan timerInterval,IFunctionInstance instance) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs : 585
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithWatchersAsync(IFunctionInstanceEx instance,ParameterHelper parameterHelper,ILogger logger,CancellationTokenSource functionCancellationTokenSource) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs : 532
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(??) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs : 468
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(??) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs : 278
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(??) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs : 322
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.TryExecuteAsyncCore(IFunctionInstanceEx functionInstance,CancellationToken cancellationToken) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs : 117

The Function:

module HttpTrigger =
    [<Literal>]
    let eventHubConnectionString = Environment.GetEnvironmentVariable(eventHubConnectionStringKey, EnvironmentVariableTarget.Process)
    let eventHubClient = lazy(EventHubClient.CreateFromConnectionString(eventHubConnectionString))

    [<FunctionName("CheckEventHubs")>]
    let checkEventHubs ([<HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = null)>]req: HttpRequest) (log: ILogger) =
        async {
            let! information = eventHubClient.Force().GetRuntimeInformationAsync() |> Async.AwaitTask
            printfn "%A" (System.String.Format("{0:D}", information))
            return OkObjectResult("…") :> IActionResult
        } |> Async.StartAsTask

The EventHubs team said they had looked into the logs from their side and they didn't see any canceled requests or time outs.

Things I've tried:

deyanp commented 4 years ago

@gzuber I tried the following C# version, and it seems to fail as well ... the C# snippet I got sent in SR 120012425000336 create a new EventHubClient in every function call (antipattern), whereas my F# and the below C# versions have the EventHubClient static ... I have the feeling that the issue is related to the static EventHubClient ...

using System;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.EventHubs;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.Extensions.Logging;

namespace ReproTaskCanceledException
{
    public static class HttpTrigger
    {
        private const string EventHubConnectionStringKey = "EventHubsConnectionString";

        private static readonly string EventHubConnectionString =
            Environment.GetEnvironmentVariable(EventHubConnectionStringKey, EnvironmentVariableTarget.Process);

        private static readonly EventHubClient EventHubClient = 
            EventHubClient.CreateFromConnectionString(EventHubConnectionString);

        [FunctionName("CheckEventHubs")]
        public static async Task<IActionResult> CheckEventHubs([HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = null)] HttpRequest req, ILogger log)
        {
            return new OkObjectResult(await EventHubClient.GetRuntimeInformationAsync());
        }
    }
}
deyanp commented 4 years ago

Yes, the following works without an error, so the problem is the static EventHubClient (and I dont think the programming language matters):

using System;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.EventHubs;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.Extensions.Logging;

namespace ReproTaskCanceledException
{
    public static class HttpTrigger
    {
        private const string EventHubConnectionStringKey = "EventHubsConnectionString";

        private static readonly string EventHubConnectionString =
            Environment.GetEnvironmentVariable(EventHubConnectionStringKey, EnvironmentVariableTarget.Process);

        [FunctionName("CheckEventHubs")]
        public static async Task<IActionResult> CheckEventHubs([HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = null)] HttpRequest req, ILogger log)
        {
            var eventHubClient = EventHubClient.CreateFromConnectionString(EventHubConnectionString);
            return new OkObjectResult(await eventHubClient.GetRuntimeInformationAsync());
        }
    }
}

How come a static EventHubClient (which is the recommendation) does not work well?

deyanp commented 4 years ago

Wondering if this has something to do with the FaultTolerantAmqpObject.OnCreateAsync, I see a timeout there ...

deyanp commented 4 years ago

Additionally, it appears to me (based on data from our running system) that function apps which use (SendAsync) less often the static eventhubclient have more TaskCanceledExceptions, which indicates in my opinion some kind of timeout ...

gzuber commented 4 years ago

@deyanp thanks for pointing this out, you're right I was quickly trying to get a repro going and accidentally created the client on every request. But it's interesting that that's the deciding factor between working and not.

I remember I tried creating a client every request in F# because at one point I was curious if that was the issue, but I still ran into exceptions. Granted that could have been due to my inexperience with the language, have you tried this in F#?

But this is a promising lead, I'll follow up with the EventHubs team and talk to them about why the static client leads to these issues, thanks!

deyanp commented 4 years ago

Yes, creating the client in every request works also in F# (I have deployed this "ugly" hack into our test and production environment), however this is against MS-own recommendation to have a static eventHubClient ... There is no difference between F# and C# when consuming EventHubs SDK ...