Azure / Industrial-IoT

Azure Industrial IoT Platform
MIT License
523 stars 214 forks source link

OPC publisher failures with IoT edge streaming service #1737

Closed daelsala closed 2 years ago

daelsala commented 2 years ago

Describe the bug A clear and concise description of what the bug is. We have a customer who has data flowing from OPC Publisher to AzureSQLEdge, and the data stopped flowing on Fri, May 27, 2022, 12:00 AM (UTC-06:00) Central Time (US & Canada). After digging for a while we found that the streaming status got stuck in starting.

Looking through the support bundle logs with the iot edge team we were unable to find any issues with the edgeHub, however the OPC publisher module keeps complaining with an error (seems like a sequence number skew, something got out of sync) **attached logs

Can you please help with finding root cause or any possible fix or workaround.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Add any other context about the problem here.

hansgschossmann commented 2 years ago

From looking at the support bundle it seems the customer is using: mcr.microsoft.com/iotedge/opc-publisher:latest This is not recommended. The customer should pin to a specific version. 2.8.2 is the most current one. 2.8.3 will be released in the next few days.

OPC Publisher does send some messages as you can see in the logs. The value of "Outgress IoT message count" is increasing, but maybe not as much as expected.

Further it looks like an issue in the OPC UA stack which is indicated by those messages, which needs more investigation: [03:22:25 ERR Microsoft.Azure.IIoT.OpcUa.Protocol.Services.StackLogger] (Error) PUBLISH #7847 - Unhandled error during Publish. BadRequestTimeout 'BadRequestTimeout'

Opc.Ua.ServiceResultException: BadRequestTimeout at Opc.Ua.Bindings.ChannelAsyncOperation`1.End(Int32 timeout, Boolean throwOnError) at Opc.Ua.Bindings.UaSCUaBinaryClientChannel.EndSendRequest(IAsyncResult result) at Opc.Ua.SessionClient.EndPublish(IAsyncResult result, UInt32& subscriptionId, UInt32Collection& availableSequenceNumbers, Boolean& moreNotifications, NotificationMessage& notificationMessage, StatusCodeCollection& results, DiagnosticInfoCollection& diagnosticInfos) at Opc.Ua.Client.Session.OnPublishComplete(IAsyncResult result)

hansgschossmann commented 2 years ago

@daelsala the BadRequestTimeout does happen when the OPC UA server does not respond to a request from the client (OPC Publisher) polling for data value changes. Most likely the system running the OPC UA server is too busy with other tasks with higher priorities (for example: if the OPC UA server runs on a PLC it is busy with controlling the system it is connected to, which has higher priority than responding to OPC UA client requests)

daelsala commented 2 years ago

@daelsala the BadRequestTimeout does happen when the OPC UA server does not respond to a request from the client (OPC Publisher) polling for data value changes. Most likely the system running the OPC UA server is too busy with other tasks with higher priorities (for example: if the OPC UA server runs on a PLC it is busy with controlling the system it is connected to, which has higher priority than responding to OPC UA client requests)

cx provided the following information, please let me know if this helps or you need further info: The Pi is updating the server every second or so, maybe faster. So perhaps the server is reading and updating too fast you believe? Then how does that explain that the data is still getting to our datalake in the cloud? This must mean the data is being read from the connector, and just not routed to the SQL right? Secondly, the AWS sitewise OPC connector has no issue reading the servers generated by the library here, so why would yours be having issues? image

daelsala commented 2 years ago

@hansgschossmann can you please assist