Azure / Industrial-IoT

Azure Industrial IoT Platform
MIT License
524 stars 214 forks source link

Some errors in OPC publisher when trying to receive 3 messages per 1 sec consistently from Kepserver #667

Closed bjbang closed 4 years ago

bjbang commented 4 years ago

Describe the bug A customer is using OPC publisher to get data from Kepserver. And it is required to be able to receive 1 message per every 300ms. (It is a requirement of their project.) That is, they would like to receive 3 messages / 1 second consistently from Kepserver regardless of whether the message size is large or small.

To meet this requirement, they tried to use 2.7.171/2.7.183 version of publisher and used --BatchTriggerInterval=00:00:00.500 and –bs=10 or higher.

To Reproduce Steps to reproduce the behavior:

  1. When they used “mcr.microsoft.com/iotedge/opc-publisher:2.7.171-linux-amd64” and the below options, an error occurred.

    '--pf=/appdata/pn.json', '--aa', '--to', '--ih=Mqtt', '--si=1', '--ms=0', '--di=60', '--BatchTriggerInterval=00:00:00.500', '--oi=300', '--mq=163840'

    The error messages is "Failed to create session '5f5eb3672da238625099fe14cd1d152bc706536e' due to One or more errors occurred. (Unexpected error processing response.)" and the log file is opcpublisher-publisher_0806.log

  2. When they used “mcr.microsoft.com/iotedge/opc-publisher:2.7.183-linux-amd64” and without BatchTriggerInterval options below, an error occurred. But if they use “mcr.microsoft.com/iotedge/opc-publisher:linux-amd64”, the error doesn’t occur.

    '--pf=/appdata/pn.json', '--aa', '--to', '--ih=Mqtt', '--si=1', '--ms=262144', '--di=60', '--oi=500', '--mq=163840'

    The error messages is "Failed to create session '5f5eb3672da238625099fe14cd1d152bc706536e' due to One or more errors occurred. (Error establishing a connection: Error received from remote host: An error occurred verifying security.)" and the log file is opcpublisher-publisher_0810.log

  3. When they used “mcr.microsoft.com/iotedge/opc-publisher:2.7.183-linux-amd64” and with BatchTriggerInterval options below, another error occurred.

    '--pf=/appdata/pn.json', '--aa', '--to', '--ih=Mqtt', '--si=1', '--ms=262144', '--BatchTriggerInterval=00:00:00.500', '--di=60', '--oi=500', '--mq=163840'

    The error messages is "Error processing job Standalone_quest-dev-edge-dems-linux_OPCPublisher. System.ObjectDisposedException: Cannot access a disposed object. at System.Threading.TimerQueueTimer.Change(UInt32 dueTime, UInt32 period) at System.Threading.Timer.Change(Int64 dueTime, Int64 period) at System.Threading.Timer.Change(TimeSpan dueTime, TimeSpan period) at Microsoft.Azure.IIoT.Agent.Framework.Agent.Worker.JobProcess.ProcessAsync() in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Agent.Framework\src\Agent\Default\Worker.cs:line 327" and "Failed to create session '5f5eb3672da238625099fe14cd1d152bc706536e' due to One or more errors occurred. (Error establishing a connection: Error received from remote host: An error occurred verifying security.)" and the log file is opcpublisher-publisher_0810_BatchTriggerInterval.log

Expected behavior The customer expects to receive 3 messages / 1 second consistently from Kepserver.

cristipogacean commented 4 years ago

Hi,

can you try to test non-secure connectivity to the Kepserver as a trial? enable security policy none for the KepServer. image

and to enable Anonymous user Login for the Kepware's OPC UA Server image

If that works, we can take it from there tuning the security options for publisher and reverting the changes in Kepserver.

bjbang commented 4 years ago

Hi,

Sorry for the delay. When the customer applied the guide that cristipogacean provided, they had the below error using mcr.microsoft.com/iotedge/opc-publisher:2.7.183-linux-amd64 with BatchTriggerInterval option.

Note: Without BatchTriggerInterval option, it worked normally with mcr.microsoft.com/iotedge/opc-publisher:linux-amd64.

Please refer the below configuration and the log file. "Cmd": [ "publisher", "--pf=/appdata/pn.json", "--aa", "--to", "--ih=Mqtt", "--si=1", "--ms=262144", "--di=60", "--oi=300", "--BatchTriggerInterval=00:00:00.300", (This option is used only for opc-publisher:2.7.183-linux-amd64) "--bs=10", "--mq=163840" ],

[03:00:40 INF Microsoft.Azure.IIoT.Module.Framework.Hosting.ModuleHost] Initialize device twin for quest-dev-edge-dems-linux - OPCPublisher [03:00:40 INF Microsoft.Azure.IIoT.Module.Framework.Hosting.ModuleHost] Applying initial desired state. [03:00:40 INF Microsoft.Azure.IIoT.Module.Framework.Hosting.ModuleHost] Reporting currently initial state. [03:00:41 INF Microsoft.Azure.IIoT.Module.Framework.Hosting.ModuleHost] Module Host started. [03:00:41 INF Microsoft.Azure.IIoT.Agent.Framework.Agent.WorkerSupervisor] Creating new worker... [03:00:41 INF Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.LegacyJobOrchestrator] File /appdata/pn.json has changed, reloading... [03:00:41 INF Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Models.PublishedNodesJobConverter] Read 1 items from published nodes file in 00:00:00.0070180 [03:00:41 INF Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Models.PublishedNodesJobConverter] Converted items to jobs in 00:00:00.1536499 [03:00:41 INF Microsoft.Azure.IIoT.Agent.Framework.Agent.WorkerSupervisor] Starting worker 'StandalonePublisher_0'... [03:00:41 INF Microsoft.Azure.IIoT.Agent.Framework.Agent.Worker] Worker StandalonePublisher_0: {} [03:00:41 INF Microsoft.Azure.IIoT.Agent.Framework.Agent.Worker] Worker: StandalonePublisher_0 processing job: Standalone_quest-dev-edge-dems-linux_OPCPublisher, mode: Active [03:00:42 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Subscription 'opc.tcp://52.231.176.21:49320_e84bc70de26cdfe582609cac9b1448a38f58421a' registered/updated in session '5f5eb3672da238625099fe14cd1d152bc706536e' in state Init [03:00:42 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Subscription 'opc.tcp://52.231.176.21:49320_e84bc70de26cdfe582609cac9b1448a38f58421a' registered/updated in session '5f5eb3672da238625099fe14cd1d152bc706536e' in state Init [03:00:44 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Creating session '5f5eb3672da238625099fe14cd1d152bc706536e' for endpoint 'opc.tcp://52.231.176.21:49320'... [03:00:45 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Azure IIoT 5f5eb3672da238625099fe14cd1d152bc706536e took 00:00:01.1853067. [03:00:45 WRN Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Failed to create session '5f5eb3672da238625099fe14cd1d152bc706536e' due to One or more errors occurred. (Unexpected error processing response.) [03:00:53 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Creating session '5f5eb3672da238625099fe14cd1d152bc706536e' for endpoint 'opc.tcp://52.231.176.21:49320'... [03:00:54 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Azure IIoT 5f5eb3672da238625099fe14cd1d152bc706536e took 00:00:01.0481937. [03:00:54 WRN Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Failed to create session '5f5eb3672da238625099fe14cd1d152bc706536e' due to One or more errors occurred. (Unexpected error processing response.) [03:01:03 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Creating session '5f5eb3672da238625099fe14cd1d152bc706536e' for endpoint 'opc.tcp://52.231.176.21:49320'... [03:01:04 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Azure IIoT 5f5eb3672da238625099fe14cd1d152bc706536e took 00:00:01.3317814. [03:01:04 WRN Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Failed to create session '5f5eb3672da238625099fe14cd1d152bc706536e' due to One or more errors occurred. (Unexpected error processing response.)

bjbang commented 4 years ago

Any updates?

bjbang commented 4 years ago

The customer performed the test that you suggested today, but the OPC publisher becomes to be failed after some minutes. The attached is the log using support-bundle --since 2m support_bundle.zip

image

They used the below pn.json. [ { "EndpointUrl": "opc.tcp://52.231.176.21:49320", "UseSecurity": false, "OpcNodes": [ { "Id": "ns=2;s=DA01.Panel-A.40002", "OpcSamplingInterval": 300, "OpcPublishingInterval": 300 }, { "Id": "ns=2;s=DA01.Panel-A.40003", "OpcSamplingInterval": 300, "OpcPublishingInterval": 300 }, … snip …

The below is the option of OPC Publisher.

image

Can you please check what the problem is?

cristipogacean commented 4 years ago

looks like the file pn.json file is corrupted:

An exception was thrown while activating Microsoft.Azure.IIoT.Agent.Framework.Agent.Worker -> Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.LegacyJobOrchestrator.
 ---> Autofac.Core.DependencyResolutionException: An exception was thrown while invoking the constructor 'Void .ctor(Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Models.PublishedNodesJobConverter, Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.ILegacyCliModelProvider, Microsoft.Azure.IIoT.Agent.Framework.IAgentConfigProvider, Microsoft.Azure.IIoT.Agent.Framework.IJobSerializer, Serilog.ILogger, Microsoft.Azure.IIoT.Module.IIdentity)' on type 'LegacyJobOrchestrator'.
 ---> Microsoft.Azure.IIoT.Exceptions.SerializerException: After parsing a value an unexpected character was encountered: {. Path '[0].OpcNodes[394].OpcPublishingInterval', line 1980, position 6.
 ---> Newtonsoft.Json.JsonReaderException: After parsing a value an unexpected character was encountered: {. Path '[0].OpcNodes[394].OpcPublishingInterval', line 1980, position 6.
   at Newtonsoft.Json.JsonTextReader.ParsePostValue(Boolean ignoreComments)
bjbang commented 4 years ago

After fixing pn.json, it looks to work normally.

In order to check if a message is delivered every 300 ms, the customer displayed the log in a callback function of TestScheduler module that receives the output from OPC publisher. In the screenshot, the log containing “Received message buffer at” is displayed from the callback function.

After analyzing the log, the customer found that the messages are occasionally delivered with 600 ms. Is it an expected behavior or not? Any reasons?

opc-test_0924.zip

image

cristipogacean commented 4 years ago

sounds good! yes, probably the value is not changed in the server every now and then within the exact sampling interval. As I explained in the call, only value changes are propagated. if the value is static it will not transported. This is the default behavior of standard opc ua publishing mechanism.

bjbang commented 4 years ago

According to the customer, the setting in Kepserver is set to generate new values for all tags continuously. And so the messages should be delivered every 300 ms. They considered that it was an abnormal situation.

The below is the simulation setting in Kepserver. For example, for 40031 tag, a random value between 1 and 100 is generated every 100 ms. And other tags are also generated every 100 ms. So, the customer think that it should be delivered every 300 ms. Do you think that there could be any other reasons?

image

markusstuhler commented 4 years ago

The latency of the OPC Publisher is not deterministic so we cannot guarantee an output of 3 messages per second. It was build for non-realtime cloud communication with eventual delivery but not for realtime scenarios like the one you described.