Azure / Industrial-IoT

Azure Industrial IoT Platform
MIT License
523 stars 214 forks source link

Kepserver restart leads to ArgumentException: Timeout must be greater than zero. #2346

Closed filipzanpwccom closed 3 weeks ago

filipzanpwccom commented 1 month ago

We are running OPC Publisher version 2.9.9.3+926a31bdda (.NET 8.0.6/linux-x64/OPC Stack 1.5.374.70). It's connected to kepserver.

From time to time kepserver gets restarted due to maintenance. Once this happens OPC Publisher is able to connect back to server, but does not publish nor recieve any data (Ingress value changes constant) Once we get OPC publisher restarted everything starts to work again.

Below partial logs just after reconnection.

[24-09-15 07:07:54.2705] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0]
      --redacted--[state:Ready|refs:3]: Session --redacted--with --redacted-- changed from Connecting to Ready
[24-09-15 07:07:54.2707] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0]
      --redacted--[state:Ready|refs:3]: Client RECONNECTED!
[24-09-15 07:07:54.2722] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0]
      --redacted--[state:Ready|refs:3]: Connection manager exited unexpectedly...
      System.ArgumentException: Timeout must be greater than zero. (Parameter 'timeout')
         at Opc.Ua.Bindings.UaSCUaBinaryClientChannel.BeginSendRequest(IServiceRequest request, Int32 timeout, AsyncCallback callback, Object state)
         at Opc.Ua.Bindings.UaSCUaBinaryTransportChannel.SendRequestAsync(IServiceRequest request, CancellationToken ct)
         at Opc.Ua.SessionClient.ReadAsync(RequestHeader requestHeader, Double maxAge, TimestampsToReturn timestampsToReturn, ReadValueIdCollection nodesToRead, CancellationToken ct)
         at Opc.Ua.SessionClientBatched.ReadAsync(RequestHeader requestHeader, Double maxAge, TimestampsToReturn timestampsToReturn, ReadValueIdCollection nodesToRead, CancellationToken ct)
         at Opc.Ua.Client.Session.FetchNamespaceTablesAsync(CancellationToken ct)
         at Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient.<>c__DisplayClass102_0.<<ManageSessionStateMachineAsync>g__ApplySubscriptionAsync|1>d.MoveNext() in /__w/1/s/Industrial-IoT/src/Azure.IIoT.OpcUa.Publisher/src/Stack/Services/OpcUaClient.cs:line 1046
      --- End of stack trace from previous location ---
         at Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient.ManageSessionStateMachineAsync(CancellationToken ct) in /__w/1/s/Industrial-IoT/src/Azure.IIoT.OpcUa.Publisher/src/Stack/Services/OpcUaClient.cs:line 948
         at Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient.ManageSessionStateMachineAsync(CancellationToken ct) in /__w/1/s/Industrial-IoT/src/Azure.IIoT.OpcUa.Publisher/src/Stack/Services/OpcUaClient.cs:line 754
[24-09-15 07:07:59.2373] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0]
      Unexpected error sending publish request.
      System.ArgumentException: Timeout must be greater than zero. (Parameter 'timeout')
         at Opc.Ua.Bindings.UaSCUaBinaryClientChannel.BeginSendRequest(IServiceRequest request, Int32 timeout, AsyncCallback callback, Object state)
         at Opc.Ua.Client.Session.BeginPublish(Int32 timeout)

  DIAGNOSTICS INFORMATION for          : OPC-MX-QU-1 (b38f7c22877a02ce3b9ad4ef08c4d2cb3f0ab809)
  # OPC Publisher Version (Runtime)    : 2.9.9.3+926a31bdda (.NET 8.0.6/linux-x64/OPC Stack 1.5.374.70)
  # Ingest duration (dd:hh:mm:ss)/Time :    04:02:23:56 | 2024-09-15T07:08:01.9401092+00:00
  # Endpoints connected/disconnected   :              1 | 0 (Connected)
  # Connection retries                 :              1
  # Subscriptions count                :              3
  # Good/Bad Monitored Items (Late)    :             91 | 0 (0)
  # Queued/Minimum request count       :              2 | 3
  # Good/Bad Publish request count     :              2 | 0

[24-09-15 07:08:04.2212] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0]
      Could not send keep alive request: System.ArgumentException Timeout must be greater than zero. (Parameter 'timeout')
[24-09-15 07:08:06.2363] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0]
      Unexpected error sending publish request.
      System.ArgumentException: Timeout must be greater than zero. (Parameter 'timeout')
         at Opc.Ua.Bindings.UaSCUaBinaryClientChannel.BeginSendRequest(IServiceRequest request, Int32 timeout, AsyncCallback callback, Object state)
         at Opc.Ua.Client.Session.BeginPublish(Int32 timeout)
[24-09-15 07:08:14.2203] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0]
      Could not send keep alive request: System.ArgumentException Timeout must be greater than zero. (Parameter 'timeout')
[24-09-15 07:08:18.2360] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaSubscription[0]
      #1/100: Subscription b53a31e7c7a024c5e6f9d2cda2d52be6f6f7edc6_0:30 is missing keep alive.

After that we repeatedly get these:

24-09-19 09:09:44.2210] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0]
      --redacted--[state:Ready|refs:3]: Got Keep Alive error: BadNoCommunication 'Server not responding to keep alive requests.' (09/19/2024 09:09:44:BadNoCommunication 'Server not responding to keep alive requests.'
[24-09-19 09:09:44.2211] info: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaClient[0]
      --redacted--[state:Ready|refs:3]: Got Keep Alive error: BadNoCommunication 'Server not responding to keep alive requests.' (09/19/2024 09:09:44:BadNoCommunication 'Server not responding to keep alive requests.'
[24-09-19 09:09:44.2212] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0]
      Could not send keep alive request: System.ArgumentException Timeout must be greater than zero. (Parameter 'timeout')
[24-09-19 09:09:47.2461] fail: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaStack[0]
      Unexpected error sending publish request.
      System.ArgumentException: Timeout must be greater than zero. (Parameter 'timeout')
         at Opc.Ua.Bindings.UaSCUaBinaryClientChannel.BeginSendRequest(IServiceRequest request, Int32 timeout, AsyncCallback callback, Object state)
         at Opc.Ua.Client.Session.BeginPublish(Int32 timeout)
marcschier commented 1 month ago

@filipzanpwccom, could you test with 2.9.11 please, which is the last supported version?

filipzanpwccom commented 1 month ago

Unfortunately rolling out new version is quite complex process for us. It will take a quite to test it as the kepserver maintenance happen rarely. All in all it will take a long time to provide a feedback. Can you pinpoint possible fix for that issue in version 2.9.11?

Is it maybe possible to mitigate via some kind of configuration setting?

marcschier commented 3 weeks ago

I cannot reproduce this on 2.9.11. Please update to 2.9.11 and if you hit the issue again, please update the issue with new logs, and I will reactivate and investigate.