Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.95k stars 442 forks source link

[Custom Handlers - Go] Function with Event Hub trigger stopped triggering #6790

Open ZachTB123 opened 4 years ago

ZachTB123 commented 4 years ago

I have a Function App that has one function with an Event Hub trigger. I noticed that this function stopped triggering around the end of the day on October 2nd/beginning of October 3rd:

Screen Shot 2020-10-14 at 8 59 36 AM

During this time that the function was not triggering, there were still messages going into the Event Hub, so I would expect my function to trigger:

Screen Shot 2020-10-14 at 9 22 47 AM

I noticed this early in the day on October 6th. At this time I went to the Function App in the portal to start investigating. Shortly after visiting the Function App in the portal, the function started to trigger again - I did not restart the function. I know after the function starting triggering again it started to experience errors because of the bug that is mentioned here but I don't have our functions updated to the latest version. With that I mind, I want to scope this issue just to the issue of the function not triggering around October 2nd. I have encountered an issue similar to this around a year ago where functions with an Event Hub trigger would randomly stopped triggering until we visited them in the portal.

Looking at App Insights, the last time I see logs for the Function App in question is 2020-10-03T01:20:01.670Z. I do not see anything in App Insights or the function detectors that would help me determine the cause.

I do have a support ticket (120100624002903) opened but I am opening an issue here also.

Investigative information

Please provide the following:

Repro steps

N/A

Expected behavior

My function should be triggering on the Event Hub events.

Actual behavior

My function stopped triggering until I visited in the portal when I was starting my investigation.

Known workarounds

Visiting the function in the portal made it start triggering again.

Related information

The function.json file for the function:

{
  "bindings": [
    {
      "type": "eventHubTrigger",
      "name": "events",
      "direction": "in",
      "eventHubName": "%EVENT_HUB_NAME%",
      "consumerGroup": "%CONSUMER_GROUP%",
      "cardinality": "many",
      "connection": "EVENT_HUB_CONNECTION_STRING",
      "dataType": "string"
    }
  ]
}
anthonychu commented 4 years ago

@pragnagopa @yojagad Can you please take a look?

pragnagopa commented 4 years ago

Opening support ticket is the right approach. Assigned to @yojagad as is currently on call.

yojagad commented 4 years ago

This seems to be an ongoing issue that has to do with eventhub trigger, nothing specific to custom handler. I think the current mitigation being recommended is to restart the function app when this happens. Looping in @wenhzha.

ZachTB123 commented 4 years ago

Yeah, I don't necessarily think this is related to custom handlers since I've experienced something similar in the past when we were using Java but just wanted to call out in the title we are using custom handlers currently.

Was just looking to get further info on why the function stopped triggering because now with encountering this issue I don't feel comfortable expanding our usage of Functions until a cause can be determined.

ZachTB123 commented 3 years ago

@yojagad Following up on this since the meeting we had with @anthonychu. I noticed this again for a separate Function App. This one stopped triggering on Dec 29 around 8:49 PM CST:

Screen Shot 2021-01-07 at 5 17 36 PM

Note that there were incoming messages to the Event Hub so I do expect it to trigger.

Like the function in the original issue, it only started working again when I visited the resource in the portal when I was beginning my investigation. After viewing it in the portal, it seemed to start to sporadically trigger. I was looking at the log stream and noticed amqp errors:

Screen Shot 2021-01-07 at 4 26 32 PM

It only started to "fully" work after I temporarily stopped it and then started it again.

I opened a ticket for this new issue (121010724005692).

With my last ticket (120100624002903), I was told that an issue had been identified and a deployment would be happening to fix it. Can you please let me know if that deployment happened and what my latest issue is from?