Azure / azure-webjobs-sdk

Azure WebJobs SDK
MIT License
739 stars 358 forks source link

Race condition when retrying failed listeners #1691

Open brettsam opened 6 years ago

brettsam commented 6 years ago

Customer reported an issue and when I looked into it, I could see that we were running a disposed host for several hours. When walking back through the logs, I noticed that it appears we are retrying a failed listener after the Job host has been stopped. This causes the stopped, disposed host to continue running functions indefinitely, which causes constant failures until the host is restarted.

Change that implemented this was https://github.com/Azure/azure-webjobs-sdk/pull/1647

Relevant log lines:

2018-05-08 11:16:44.9554933    The listener for function 'Functions.functionname' was unable to start....
2018-05-08 11:18:55.9265356    Retrying to start listener for function 'Functions.functionname' (Attempt 1)
2018-05-08 11:19:11.4812976    Job host stopped
2018-05-08 11:19:14.9707997    Listener successfully started for function 'Functions.functionname' after 1 retries.
brettsam commented 6 years ago

I believe this may be a bug in the Event Hub trigger. It looks like it's not set up to handle StopAsync() being called while StartAsync() is executing. If Start() is in the middle of registering a handler while Stop() is called, it looks like it'll be a no-op and the listener will continue running:

https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Extensions.EventHubs/EventHubListener.cs#L48-L62

brettsam commented 5 years ago

Rather than create a new issue -- I'll reactivate this one. It looks like we still have an issue here. I can see an EventHubListener getting started during a retry after the host is stopped. It tries to process events on a disposed host for a while:

FunctionsLogs
| where PreciseTimeStamp > datetime(2019-02-13 07:20) and PreciseTimeStamp < datetime(2019-02-13 10:00)
| where AppName == "rdc-func-app"
| where HostInstanceId == "95ea2cfb-05cd-4952-8cb2-36177aed7912"
| project PreciseTimeStamp, Details, Summary, HostInstanceId, Level, HostVersion