Ability to specify cache age for cyclic reads (was Cyclic read response sync or async processed?)

b-enigma-con commented 2 months ago

In our case, we have lots of devices/PLC's connected to a single OPC UA server using various drivers (OPC UA / proprietary PLC driver etc.). We have configured approximately 400 nodes covering all devices to be read in a cyclic way, all with the same interval of 5 seconds. The OPC UA server has more than enough resources (CPU/RAM etc.). I would expect that a read of 400 nodes, every 5 seconds, would be possible using the cyclic read functionality, but it is not working out in our case; we have lots of server queue overflows and data is coming in irregularly. I did indeed read about system limits being reached quicker with cyclic reads, as opposed to using OPC UA subscription services. Another fact that might be related to our problem is that PLC's are often turned off. We tried to tweak timeout settings for these devices in the drivers of the OPC UA server.

We noticed some strange behaviour which we could not explain.

The cyclic read feature batches all nodes to be read with the same interval in a single read request. Is the response to this request synchronous and does it wait until all values are read, or does it process intermediate results (before the whole read operation is finished)? Because we noticed that we receive values as expected (every 5 seconds) for some devices, while other devices are very irregular (queue overflow?), and other devices only get a value every couple of minutes. It seems as if individual results of this batched read operation are returned asynchronously as soon as they are available? Can someone explain the way a cyclic read response is processed?
In line with the previous question: If some devices in the OPC UA server are turned off and have a specific timeout period, would this interfere with the response times of devices that are turned on and respond quickly when using cyclic reads?

We're not able to grasp why we can't read 400 nodes every 5 seconds using the cyclic read. Could this be because of devices that are turned off, causing some delay in responding? We do see some devices responding quickly, implying that the read response is being processed asynchronously instead of an "all or nothing" approach...

Can anyone please shine some light on this issue and behaviour?

marcschier commented 2 months ago

You are correct that the read is batched. How the batch is processed is up to the server implementation. We set the "maxAge" to 0, so request "un-cached" reads, if the server needs to do a lot of work to read without cache, then it will affect all values in the batch.

You can use registered reads (register node) which can improve performance on the server side, but ultimately it is a question for the server vendor. I am thinking to provide an option to relax the un-cached read to allow cached read with age x < sampling interval. But then the values will be any time between the sampling interval. But first, please check with the server vendor to see how reads can be improved and whether this scenario is even supported by them.

b-enigma-con commented 2 months ago

We did some more research, and we noticed lots of differences in how quick values are read from the different devices. Some drivers in our OPC UA server even seem to try to read tags sequentially when the PLC is turned off. In other words, reading 10 tags of that device when it is turned off and with a configured timeout of 1 second will take 10 seconds (each source timestamp is increased with 1 second, quality is bad of course). It seems to want to read each tag individually when it is off. And this way, we will have all kinds of behaviour, which will also be based on the availability of the PLC.

So I indeed think that reading the cache is our best option here. Our OPC UA server seems to support your suggestion above:

Resolution The client application should establish the UA connection using Poll mode (direct reads) and Registered Nodes=Enabled This tells the server that clients have an ongoing interest in those tags and will cause the server to maintain a cached value for those tags The server will read from cache until the MaxAge parameter is met otherwise it will then read from the device

I'm very interested in being able to set the MaxAge parameter.

marcschier commented 2 months ago

"Registered read" is already available in preview for a while, you can set the RegisterNode: true property in the node entry in the JSON configuration to enable this to happen with your opc ua server and test. But please note - there are a) a limited number of registered nodes that are supported, and b) if registering the node fails OPC Publisher does not fail, but uses the unregistered node id instead and c) every time the subscription is re-synchronized (e.g., reconnect/change) the original node is re-registered, which server might not like (found that in the TODO in the area). So, if you use it to test against your server, please test for a day and let me know if there are later slowdowns happening. I will tackle the todo for 2.9.12.

The cache age will be able to be set using the CyclicReadMaxAgeTimespan property in 2.9.12, using duration in timespan format.

b-enigma-con commented 2 months ago

I'm wondering actually if this is going to work in the case where a specific PLC is turned off. For example, I have a PCL with 20 nodes of which I want to get the value every 5 seconds. I noticed that when the PLC is turned off and I do a cyclic read (all nodes are batched in this case), the items seem to be read one by one by the driver with our configured timeout of 1 second. This leads to the following result (mind the "source timestamp"):

ns=2;s=xxx.yyy.Precision.CH12_Precision (value: , statuscode: 2147483648, server timestamp: 2-9-2024 10:01:28, source timestamp: 2-9-2024 10:01:28) ns=2;s=xxx.yyy.Precision.CH11_Precision (value: , statuscode: 2147483648, server timestamp: 2-9-2024 10:01:29, source timestamp: 2-9-2024 10:01:29) ns=2;s=xxx.yyy.Precision.CH10_Precision (value: , statuscode: 2147483648, server timestamp: 2-9-2024 10:01:30, source timestamp: 2-9-2024 10:01:30) ns=2;s=xxx.yyy.Precision.CH09_Precision (value: , statuscode: 2147483648, server timestamp: 2-9-2024 10:01:31, source timestamp: 2-9-2024 10:01:31) ns=2;s=xxx.yyy.Precision.CH08_Precision (value: , statuscode: 2147483648, server timestamp: 2-9-2024 10:01:32, source timestamp: 2-9-2024 10:01:32) ns=2;s=xxx.yyy.Precision.CH07_Precision (value: , statuscode: 2147483648, server timestamp: 2-9-2024 10:01:33, source timestamp: 2-9-2024 10:01:33) ns=2;s=xxx.yyy.Precision.CH06_Precision (value: , statuscode: 2147483648, server timestamp: 2-9-2024 10:01:34, source timestamp: 2-9-2024 10:01:34) ns=2;s=xxx.yyy.Precision.CH05_Precision (value: , statuscode: 2147483648, server timestamp: 2-9-2024 10:01:35, source timestamp: 2-9-2024 10:01:35)

So with this driver, we have a 20 second delay for a batch read in which other items are read as well (it seems that this driver is reading sequentially). Can we overcome this issue with the registered reads/max age? To prevent the OPC UA server from actually reading the value from the device, I have to configure quite a long MaxAge I'm afraid?

At one point, the MaxAge will be expired, and the next read will lead to an actual read of the value in the device, causing a timeout that will get in the way again of other values...

So the trick is to never let a read lead to a read on the physical device, the value should always come from the cache, that would solve our problem.

marcschier commented 2 months ago

mcr.microsoft.com/iotedge/opc-publisher:2.9.12-preview1 is available to test the new cache age and the new periodic heartbeat behaviors.

Azure / Industrial-IoT

Ability to specify cache age for cyclic reads (was Cyclic read response sync or async processed?) #2328