Azure-Samples / iotedge-logging-and-monitoring-solution

IoT Edge Logging and Monitoring Solution (ELMS) is an architecture and sample cloud workflow that enables automated retrieval of logs and metrics from IoT Edge devices
MIT License
42 stars 21 forks source link

Collect Metrics function stops working from time to time #3

Closed MagdaPaj closed 2 years ago

MagdaPaj commented 2 years ago

This issue is for a: (mark with an x)

- [x ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

From time to time the Collect Metrics function is not able to successfully finish the execution. It keeps receiving events, but it doesn't finish processing. The problem is not visible in Azure Portal under Function -> Monitor -> Invocation, Error count is 0. And it looks like the function gets stuck. This can take hours, last time lasted more than 12 hours, which means that during 12 hours we didn't have Insights Metrics. And then suddenly, without any intervention from our side, it starts working again. This behavior happened multiple times, and in multiple environments.

Any log messages given by the failure

In Application Insights traces we are able to see the following two errors:

 exception occurred : System.AggregateException: One or more errors occurred. (The SSL connection could not be established, see inner exception.)
 ---> System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception.
 ---> System.ComponentModel.Win32Exception (0x8009030D): The credentials supplied to the package were not recognized
   at System.Net.SSPIWrapper.AcquireCredentialsHandle(SSPIInterface secModule, String package, CredentialUse intent, SCHANNEL_CRED scc)
   at System.Net.Security.SslStreamPal.AcquireCredentialsHandle(CredentialUse credUsage, SCHANNEL_CRED secureCredential)
   at System.Net.Security.SslStreamPal.AcquireCredentialsHandle(X509Certificate certificate, SslProtocols protocols, EncryptionPolicy policy, Boolean isServer)
   at System.Net.Security.SecureChannel.AcquireClientCredentials(Byte[]& thumbPrint)
   at System.Net.Security.SecureChannel.GenerateToken(Byte[] input, Int32 offset, Int32 count, Byte[]& output)
   at System.Net.Security.SecureChannel.NextMessage(Byte[] incoming, Int32 offset, Int32 count)
   at System.Net.Security.SslStream.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.PartialFrameCallback(AsyncProtocolRequest asyncRequest)
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Security.SslStream.ThrowIfExceptional()
   at System.Net.Security.SslStream.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Net.Security.SslStream.EndProcessAuthentication(IAsyncResult result)
   at System.Net.Security.SslStream.EndAuthenticateAsClient(IAsyncResult asyncResult)
   at System.Net.Security.SslStream.<>c.<AuthenticateAsClientAsync>b__65_1(IAsyncResult iar)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at System.Threading.Tasks.Task`1.get_Result()
   at FunctionApp.CertificateGenerator.CertGenerator.RegisterWithOms(X509Certificate2 cert, String AgentGuid, String logAnalyticsWorkspaceDomainPrefixOms) in /home/runner/work/iotedge-logging-and-monitoring-solution/iotedge-logging-and-monitoring-solution/FunctionApp/FunctionApp/CertificateGenerator/CertGenertor.cs:line 381
   at FunctionApp.CertificateGenerator.CertGenerator.RegisterWithOmsWithBasicRetryAsync(X509Certificate2 cert, String AgentGuid, String logAnalyticsWorkspaceDomainPrefixOms) in /home/runner/work/iotedge-logging-and-monitoring-solution/iotedge-logging-and-monitoring-solution/FunctionApp/FunctionApp/CertificateGenerator/CertGenertor.cs:line 404
Registering agent with OMS failed (are the Log Analytics Workspace ID and Key correct?) : System.AggregateException: One or more errors occurred. (The SSL connection could not be established, see inner exception.)
 ---> System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception.
 ---> System.ComponentModel.Win32Exception (0x8009030D): The credentials supplied to the package were not recognized
   at System.Net.SSPIWrapper.AcquireCredentialsHandle(SSPIInterface secModule, String package, CredentialUse intent, SCHANNEL_CRED scc)
   at System.Net.Security.SslStreamPal.AcquireCredentialsHandle(CredentialUse credUsage, SCHANNEL_CRED secureCredential)
   at System.Net.Security.SslStreamPal.AcquireCredentialsHandle(X509Certificate certificate, SslProtocols protocols, EncryptionPolicy policy, Boolean isServer)
   at System.Net.Security.SecureChannel.AcquireClientCredentials(Byte[]& thumbPrint)
   at System.Net.Security.SecureChannel.GenerateToken(Byte[] input, Int32 offset, Int32 count, Byte[]& output)
   at System.Net.Security.SecureChannel.NextMessage(Byte[] incoming, Int32 offset, Int32 count)
   at System.Net.Security.SslStream.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslStream.PartialFrameCallback(AsyncProtocolRequest asyncRequest)
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Security.SslStream.ThrowIfExceptional()
   at System.Net.Security.SslStream.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Net.Security.SslStream.EndProcessAuthentication(IAsyncResult result)
   at System.Net.Security.SslStream.EndAuthenticateAsClient(IAsyncResult asyncResult)
   at System.Net.Security.SslStream.<>c.<AuthenticateAsClientAsync>b__65_1(IAsyncResult iar)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at System.Threading.Tasks.Task`1.get_Result()
   at FunctionApp.CertificateGenerator.CertGenerator.RegisterWithOms(X509Certificate2 cert, String AgentGuid, String logAnalyticsWorkspaceDomainPrefixOms) in /home/runner/work/iotedge-logging-and-monitoring-solution/iotedge-logging-and-monitoring-solution/FunctionApp/FunctionApp/CertificateGenerator/CertGenertor.cs:line 381
   at FunctionApp.CertificateGenerator.CertGenerator.RegisterWithOmsWithBasicRetryAsync(X509Certificate2 cert, String AgentGuid, String logAnalyticsWorkspaceDomainPrefixOms) in /home/runner/work/iotedge-logging-and-monitoring-solution/iotedge-logging-and-monitoring-solution/FunctionApp/FunctionApp/CertificateGenerator/CertGenertor.cs:line 404
   at FunctionApp.CertificateGenerator.CertGenerator.RegisterAgentWithOMS(String logAnalyticsWorkspaceDomainPrefixOms) in /home/runner/work/iotedge-logging-and-monitoring-solution/iotedge-logging-and-monitoring-solution/FunctionApp/FunctionApp/CertificateGenerator/CertGenertor.cs:line 462

The second message is misleading. Log Analytics Workspace ID and Key are correct, because eventually the function starts to work again, without any change.

Expected/desired behavior

OS and Version?

Mention any other details that might be useful

robertokcanale commented 2 years ago

Folloeing, I have a similar problem

marvin-garcia commented 2 years ago

We are aware of this issue and are investigating a solution that will solve it. More updates will be shared shortly.

MagdaPaj commented 2 years ago

Fix implemented here.