Azure / azure-functions-eventhubs-extension

Event Hubs extension for Azure Functions
MIT License
20 stars 26 forks source link

System.Net.Sockets.SocketException - Dropping Messages? #43

Closed thomasmichaelwallace closed 4 years ago

thomasmichaelwallace commented 4 years ago

(I'm still finding my way around Azure, so apologise if this is not the approrpiate place to raise this issue).

We have a Node.js Function App configured with an Azure Event Hubs trigger.

Because this is a fairly critical piece of infrastructure to us, we are alerting on the Application Insights query (exceptions where severityLevel > 1) > 0.

Last night we received 14 alerts of: Error processing event from Partition Id: '#', Owner: '#uuid#', EventHubPath: '#consumer-group#'

I'm having trouble deciphering the error messages- but does this mean that the extension dropped 14 messages?

I'm happy to provide any more information, the most pertenant for this project seems to be the parsedStack:

{
  "parsedStack": [
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 0,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 1,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
      "level": 2,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.Amqp.AmqpEventHubClient+<CreateConnectionAsync>d__32.MoveNext",
      "level": 3,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 4,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 5,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
      "level": 6,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1+<OnCreateAsync>d__6.MoveNext",
      "level": 7,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 8,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 9,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext",
      "level": 10,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 11,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 12,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext",
      "level": 13,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 14,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 15,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
      "level": 16,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<CreateLinkAsync>d__15.MoveNext",
      "level": 17,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 18,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 19,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
      "level": 20,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1+<OnCreateAsync>d__6.MoveNext",
      "level": 21,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 22,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 23,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext",
      "level": 24,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 25,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 26,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext",
      "level": 27,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 28,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 29,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<OnReceiveAsync>d__13.MoveNext",
      "level": 30,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 31,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<OnReceiveAsync>d__13.MoveNext",
      "level": 32,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 33,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 34,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
      "level": 35,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.PartitionReceiver+<ReceiveAsync>d__31.MoveNext",
      "level": 36,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 37,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 38,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<ReceivePumpAsync>d__18.MoveNext",
      "level": 39,
      "line": 0
    }
  ],
  "severityLevel": "Error",
  "outerId": "0",
  "message": "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.",
  "type": "System.Net.Sockets.SocketException",
  "id": "61997365"
}
alrod commented 4 years ago

The error comes from: https://github.com/Azure/azure-functions-eventhubs-extension/blob/dev/src/Microsoft.Azure.WebJobs.Extensions.EventHubs/Listeners/EventHubListener.cs#L160

The method called when the underlying client experiences an error while receiving. EventProcessorHost will take care of recovering from the error and continuing to pump messages. https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.eventhubs.processor.ieventprocessor.processerrorasync?view=azure-dotnet

So nothing to worry about, your messages did not drop as there wasn't checkpoint operation.

thomasmichaelwallace commented 4 years ago

Thanks @alrod

So if I understand correctly this is a bug. Because an expected exception (System.Net.Sockets.SocketException) is being logged with a high severity?

What's the best place to report/fix this?

alrod commented 4 years ago

I am not sure if this a valid bug: The method called when the underlying client experiences an error while receiving. EventProcessorHost will take care of recovering from the error and continuing to pump messages. https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.eventhubs.processor.ieventprocessor.processerrorasync?view=azure-dotnet

Errors during pumping new messages are expected and EventHub guranties a message a will be delivered at least once.

How often do you get SocketExceptions ? only during a night?

thomasmichaelwallace commented 4 years ago

I guess the bug for me is the serverity (just like in #5).

Consider:

I am trying to monitor the event hub -> function app connection so that I can quickly react to any issues. For example, just today a company I'm working with revoked the shared access key, and but watching for errors I could quickly let them know and restore the string.

Having a couple of message with a high serveriy level where the only action we can take is to ignore them does not scalable. Especially when they happen at two in the morning!

thomasmichaelwallace commented 4 years ago

(To answer your question - we've had 16 over the last week)

alrod commented 4 years ago

we can't change severity level for specific exception types. we consider ReceiverDisconnectedException and LeaseLostException as information because these exceptions can happen as part of normal partition balancing across instances.

16 per week is fairly low amount, I only can recommend adjust your alerting logic:

  1. Filter by specific exception types.
  2. Send an alert if you have more then x erros during y period of time.
thomasmichaelwallace commented 4 years ago

Ok, I guess I'm just missing context here

The issue with only sending an alert if there are more than x a minute is that we risk missing actual exceptions. (For example, we get the odd storage exception that happens due to an error we need to resolve).

And if I filter by the socket exception, then it feels like we're all acknowledging that all socket errors expected in normal operation; and that there's no risk to ignoring them?

I guess I'm just missing the context as to why we can't alter the severity of the exception that we know is not exceptional?

Maybe there is an opportunity to document a list of exceptions, like these, which are thrown as part of the internals of the function app when discussing monitoring event hub integrations?

(As an aside, 16 might not sound like much, but there's only 10 devices pushing data; which is why it doesn't feel very scalable)

thomasmichaelwallace commented 4 years ago

Or to put it another way. It feels like you're saying it's reasonable to expect anyone who wants to monitor an event hubs integration to build the following query:

exceptions
| where severityLevel > 1 // only alert on errors and warnings
| where innermostMessage != "The lease ID specified did not match the lease ID for the blob." // known bug: https://github.com/Azure/azure-functions-eventhubs-extension/issues/42
| where problemId != "System.Net.Sockets.SocketException at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw" // unexceptional: https://github.com/Azure/azure-functions-eventhubs-extension/issues/43
| where not (problemId startswith "System.ArgumentOutOfRangeException at Microsoft.Azure.EventHubs.Processor.PartitionContext") // unexceptional: https://stackoverflow.com/questions/51602965/azure-eventhub-function-leaselostexception
thomasmichaelwallace commented 4 years ago

Closing due to inactivity.