OPCFoundation / UA-.NETStandard

OPC Unified Architecture .NET Standard
Other
1.96k stars 948 forks source link

Reconnect does not work when a subscription exists prior to a reconnect attempt (since 1.4.368.33) #1796

Closed htbmw closed 11 months ago

htbmw commented 2 years ago

Type of issue

Current Behavior

When a subscription exist on a session between the client and the server and the server is restarted the client is unable to reconnect and nothing further happens on the client side (no monitored item notifications received) while on the server, many sessions are created.

So far this only seem to happen when a subscription exists. When no subscription exists, reconnect works correctly.

It was tested against real hardware PLC: Siemens SIMATIC S7-1500 Software Version: 2.9.4

It was also tested with the OPCFoundation reference server and failed when the server references versions before 1.4.368.33 It works on the OPCFoundation reference server only when the server references 1.4.368.33 and higher versions. Updating the server references is not enough because this issue happens on hardware as well.

None of this happened before 1.4.368.33, and I suspect it is related to the TransferSubscriptions feature that was released with 1.4.368.33

Expected Behavior

When a subscription exist on a session between the client and the server and the server is restarted the client is able to reconnect and resume receiving monitored item notifications while on the server, only a single session is created.

Steps To Reproduce

  1. Start up any of the following two types of servers:
    • an OPCFoundation reference server that references a version before 1.4.368.33
    • Siemens SIMATIC S7-1500 PLC
  2. Start up a client that references 1.4.368.33 or later version
  3. Connect the client to the reference server or the Siemens PLC
  4. Create a subscription
  5. Restart the server
  6. Observe that the client never reconnects
  7. On the server, observe that many sessions are created

Environment

- Component: Reconnect feature
- Server: 1.4.367.100 (any version before 1.4.368.33) or Siemens SIMATIC S7-1500 PLC
- Client: 1.4.368.33 or 1.4.368.53 (any version after 1.4.368.33)

Anything else?

There seems to be a fix for session recreate in 1.4.368.53, according to the change log, but even with 1.4.368.53 on the client, the reconnect issue still persists as described in the Current Behavior:

Fixed session recreate TransferSubscriptions bug. #1734 and #1733 by @xky0007 in #1746

mregen commented 2 years ago

Thanks for the detailed report, the automatic Transfersubscription feature in the client needs more testing on a wider range of devices to get better understanding with regards to interoperability.

htbmw commented 2 years ago

Thanks for the detailed report, the automatic Transfersubscription feature in the client needs more testing on a wider range of devices to get better understanding with regards to interoperability.

I noticed that you made some changes over the weekend with regards to reconnect issues, so I pulled your branch to test and see if these changes made any differences.

I noticed the following 2 changes in behavior,

  1. for a reference server that references version before 1.4.368.33, it seems to now reconnect for the scenario where a subscription existed before the disconnect, but the monitored item notifications only resume approximately 60 second later. I would expect notifications to resume shortly after the reconnect, not 60 seconds later.

  2. On a real Siemens PLC the reconnect is still not happening at all. The only difference this time is that I don't see the multiple sessions / subscriptions being created as before during the reconnect attempt (which is the incorrect behavior described in the original bug report).

htbmw commented 2 years ago

One of the drawbacks documented for the TransferSubscriptions feature is that there is no way to opt-out of the feature. Would it be possible to allow an opt-out setting so that we can bypass this feature while it is still being tested / investigated? In my opinion, the TransferSubscriptions feature should be marked as experimental until interoperability is better understood and can be properly fixed? There are mixed results with virtual opc ua servers, but it is a bit of a critical issue given that it doesn't work with a real Siemens PLC (as far as my test results are concerned).

htbmw commented 2 years ago

One of the drawbacks documented for the TransferSubscriptions feature is that there is no way to opt-out of the feature. Would it be possible to allow an opt-out setting so that we can bypass this feature while it is still being tested / investigated? In my opinion, the TransferSubscriptions feature should be marked as experimental until interoperability is better understood and can be properly fixed? There are mixed results with virtual opc ua servers, but it is a bit of a critical issue given that it doesn't work with a real Siemens PLC (as far as my test results are concerned).

I think this is what you had in mind by adding this setting?

add a property to enable subscription transfer in reconnect, set Session.TransferSubscriptionsOnReconnect=true to enable, IOP issues may not allow to use it with some servers.

mregen commented 2 years ago

Hi @htbmw , the PR is in final testing, I think I found all the issues you documented. Please help testing upcoming release.

htbmw commented 2 years ago

Hi @mregen, thanks so much for the heads up and appreciate your speedy action on this. I will test this as soon as I can and provide feedback.

htbmw commented 2 years ago

@mregen, unfortunately I am not having any successes. So far I am not seeing consistent results:

With a Siemens PLC the client is still not capable of resuming the connection. We tested this by connecting to the PLC, creating a subscription and then restarting the PLC, but the client never reconnects.

With the same client, when we connect to a reference server (1.4.367.100), and we create a subscription and then restart the reference server, the client reconnects, but the monitored item notifications only start after exactly 2 minutes (120 seconds / 120 000 ms).

It doesn't seem as if the setting Session.TransferSubscriptionsOnReconnect has any effect in any of the above scenarios, whether enabled or disabled.

I will keep testing and try to dig a bit deeper when I find some time.

mregen commented 2 years ago

@htbmw which client are you using?

htbmw commented 2 years ago

This is in a custom connector implementation that acts as an OPC UA client. I have added project references in my implementation to the required libraries on the release/1.4.368 branch.

My next plan is to test this with a sample client application from the OPCFoundation samples repo. I need to rule out the possibility that it could be something in my implementation.

I will keep you updated.

mregen commented 2 years ago

Hi @htbmw, there are many settings to tweak the client by means of session timeout/ subscription lifetime etc. also I pulled the SessionReconnectHandler in the console UAClient sample, if you run it against your server with a few more subscriptions and logging may provide some insights. Cheers!.

htbmw commented 2 years ago

Hi @mregen

I followed your advice and used the console client and tested the reconnect. Reconnect seems to work for the virtual opc ua servers where my client doesn't work. I will unfortunately have to wait to till Monday before I can test it against a Siemens PLC.

I adjusted my reconnect logic to be more in line with the console client's implementation and this also now works better with the virtual opc ua servers where I had issues before.

Would it be possible for you to build me a prelease so I can reference prerelease packages for testing. The project references is quite a mission to set up.

I am referring to the release/1.4.368 branch.

Thanks!

mregen commented 2 years ago

Hi @htbmw , thanks for the testing! We are currently testing a hotfix for 368 which should address a few issues which were found recently, including reconnect. Its version is 1.4.368.58 and yet only available on the preview feed. You can add the preview feed on your dev project like this: Nuget.Config Also the fixes are merged back into master.

htbmw commented 2 years ago

Hi @mregen , Thanks so much for the tip for adding the preview / prerelease package source to Nuget.config I tested 1.4.368.58 and so far it looks good for the virtual opc ua servers I previously had reconnect issues with. I will test this against the Siemens PLC on Monday and hopefully there will be more good news. Have great weekend!

htbmw commented 2 years ago

Hi @mregen,

After final testing done with a real Siemens PLC, I am happy to report that the reconnect issue has been resolved. Thanks against for the quick action and assistance regarding this.

mregen commented 2 years ago

1.4.368.58 released!

raulpsj commented 1 year ago

Hello,

This problem has not yet been solved for the Siemens 1500 PLC with the latest published version 1.4.371.41.

I have checked with version 1.4.367.100 it works without problems.

I have been able to verify that the problem is reproduced only in the case that the connection loss is greater than the time that the BeginReConnect method of the SessionReconnectHandler class makes the call to the callback method passed by parameter. In my case, if the communication recovers in approximately 10 seconds, it reconnects, after 15 seconds it never reconnects.

I attach the log generated by the Opc.Ua.Client library where there are two reconnection attempts:

========================================= Id: BadServiceUnsupported Description: BadServiceUnsupported

BadServiceUnsupported --- at Opc.Ua.ClientBase.ValidateResponse(ResponseHeader header) --- at Opc.Ua.SessionClient.TransferSubscriptions(RequestHeader requestHeader, UInt32Collection subscriptionIds, Boolean sendInitialValues, TransferResultCollection& results, DiagnosticInfoCollection& diagnosticInfos) --- at Opc.Ua.Client.Session.TransferSubscriptions(SubscriptionCollection subscriptions, Boolean sendInitialValues) --- at Opc.Ua.Client.Session.RecreateSubscriptions(IEnumerable`1 subscriptionsTemplate)

" OPC_UA.log

mregen commented 1 year ago

@raulpsj thanks for the information, please open a new issue next time and refer to the closed issue, I was just lucky to find the update in my inbox, otherwise I had missed it.

raulpsj commented 1 year ago

Sorry, this is the first time I've reported a bug. thanks for the info @mregen

WouterVanderGucht commented 1 year ago

I am experiencing the same issues with the siemens 1500 PLC. If you need any debugging or testing done, I am happy to provide assistance.

raulpsj commented 1 year ago

hello,

as an extra information, you have verified that the same problem is reproduced in Siemens 1200 PLC.

Raúl

El mié., 21 dic. 2022 8:28, WouterVanderGucht @.***> escribió:

I am experiencing the same issues with the siemens 1500 PLC. If you need any debugging or testing done, I am happy to provide assistance.

— Reply to this email directly, view it on GitHub https://github.com/OPCFoundation/UA-.NETStandard/issues/1796#issuecomment-1360950239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOLPGJRTIVT2H4357YXRKDWOKWQPANCNFSM5VEKK5XA . You are receiving this because you were mentioned.Message ID: @.***>

stefanluchian commented 1 year ago

Hello there!

I also have reconnect problems using the version 1.4.371.41 against a Siemens SIMATIC S7-1500. The Opc-Server from the PLC is restarting from time to time (needs recompilation, cause we are in the development stage of the project). I have no influence on the OPC-Server or on the team that develops it, so I have to live with these restarts. The developer on the side of server told me, that they use no durable subscriptions, so I wouldn't need to transfer the subscriptions, but only to reconnect resp. recreate the session. I use the KeepAlive event from the Session and implemented our client taking the example from SessionReconnectHandler. But, as we use async / await (for the moment in a .NetFramework 4.8), I needed to get out of the thread which governs the callback from the KeepAliveTimer. If I stay in the calling thread, even with BeginInvoke (as InvokeRequired==true) I cannot reconnect. Only if I come from another thread (in my implementation from a System.Timers.Thread, which was created in the main thread) I am able to call Session.Reconnect succesfully. And I receive the new KeepAlive(s) almost immediatelly (only two calls come in the name of the old session, after that, they are obviously disposed by the Session.Reconnect). By the way: it would be wonderful to have a Session.ReconnectAsync in the future. I speculate, that the Session.Reconnect cannot be called within the same thread that made the callback thorugh KeepAlive of the old session, because that would collide with the migration of all the Events to the new session. I also might be wrong, although.

Another use case is when the Server remains on, but the connection by the client gets lost. I still wonder, why does KeepAlive not recover successfully inside of the old session, because I receive BadNoCommunication forever, even if the server is available again. In that case, I also need to reconnect and is also successfully only on another thread.

I was tempted to update to the last stable version, be I fear the behaviour with the exponential backoffs.

Does anyone else experience problems by KeepAlive, even if the server is available again?

MfG Stefan Luchian

mregen commented 11 months ago

Hi @stefanluchian, the Session.ReconnectAsync has been implemented and tested. please check also the conversation here: #2391

stefanluchian commented 10 months ago

Thank you, @mregen . As soon as I go back to that project, I will integrate the ReconnectAsync.