Open eddybl opened 1 year ago
The crashes definitely are a bug that needs to be addressed. Do you have any more details that might help with reproducing this problem (e.g. does it only happen when a server is unavailable or when there is more than one connection defined in the IOC)?
The startup taking so long is the result of the code trying to reconnect if there is no working connection. We could implement a mechanism that blocks reconnection attempts for a certain time after a failed connection attempt. This would probably improve startup times in this scenario, the downside being that this means it might take longer for a connection to be reestablished after the cause of the problem has been resolved.
I added a branch on the IOC for power monitoring PLCs "test-opcua-issues" with only the PLC which is currently offline.
It seems like without all the other channels the IOC init does work reasonably quickly, but still trying to connect (and failing) to each individual channel one after each other takes a long time, 3-4 seconds per channel, but it seems to try to connect to each channel individually, so right now it takes around 10 minutes to loop through all records. Once all records where processed further errors show up with non-EPICS
lines:
2022/11/15 23:13:05.273562 non-EPICS_139649138407168 Could not connect to OPC UA server: BadConnectionClosed
But without the other additional channels it does not seem to crash, whereas with all the channels it does seem to crash (as evident by the cmk e-mails realted to the Power Monitoring PLC IOC)
We use one IOC with multiple connections to several OPCUA servers.
Now it seems like one of these OPCUA servers is down. This seems to crash the IOC after a while and also makes the start up very slow while it trys to connect to each individual channel of this not available server. So the ioc init takes a couple of minutes while it is still working through the not available channels and crashes suddenly:
Both issues seem less than ideal (slow init and crash), is there something to improve this situation?