Azure / Industrial-IoT

Azure Industrial IoT Platform
MIT License
523 stars 214 forks source link

BadNodeIdUnknown not re-evaluated despite short retry delays? #2188

Closed NoTuxNoBux closed 7 months ago

NoTuxNoBux commented 8 months ago

Describe the bug I have the OPC Publisher configured with --badnoderetrydelay=5 --invalidnoderetrydelay=5, but after the specified amount of seconds, it still doesn't appear to start monitoring the items that failed creating before.

Is there another option I'm missing or do these options perhaps not do what I think it does?

To Reproduce Steps to reproduce the behavior:

  1. Subscribe to a property that doesn't exist yet.
  2. Observe the following warning in the logs:
    [24-02-12 12:18:46.9059] warn: Azure.IIoT.OpcUa.Publisher.Stack.Services.OpcUaMonitoredItem.DataItem[0]
      Error adding monitored item Data Item 'ns=1;s=:Robot:Applications:IO:string:sJoints:sJoints[0]' with server id  - not created to 
    subscription #2412385306 due to BadNodeIdUnknown.
  3. Make the property exist without restarting the server or doing so quickly enough so a disconnection does not occur.
  4. Observe the property still not being monitored after 5 seconds.

Expected behavior The items are picked up and monitored after the specified interval, and immediately have their initial data sent over the wire, as is the case for other properties that are subscribed to for the first time by default as well.

Additional context I can cause this on my server because it has specific properties that only become available once an application is loaded on it, so as long as I don't load it, they don't exist. This ties in to the old ticket #1890, which I expected to be fixed now since there is official support for these command line options.

This condition can occur in production for us, meaning that if the publisher happens to connect before an application is loaded, or when another application is loaded, it won't monitor the necessary properties once the application is loaded, and will never try to do so again until a restart happens.

marcschier commented 8 months ago

Looks like a regression.

marcschier commented 7 months ago

Issue is due to the configuration being overridden with 0 and 0 now disabling it. This is now fixed.