OPCFoundation / UA-.NETStandard

OPC Unified Architecture .NET Standard
Other
1.9k stars 926 forks source link

Client will reopen secure channel with new ID without reactivating session #2538

Closed einarmo closed 1 month ago

einarmo commented 4 months ago

Type of issue

Current Behavior

If the TCP connection is unexpectedly killed, the client will open a new secure channel with SecureChannelId 0, but it will not follow that up with ActivateSession. From what I can tell this is in violation of the standard.

What will eventually happen is that the new conection will fail, and the client will open another new secure channel where it will attempt to reactivate the session, this will only happen after keep alive timed out on the half-opened session.

Expected Behavior

The client should always reactivate sessions when creating a new secure channel before sending normal OPC UA requests on the new channel.

Steps To Reproduce

This is reproducible with a local server and a local client, including the sample client/server.

  1. Connect to an OPC-UA server.
  2. Monitor traffic in a tool like wireshark.
  3. Use tcpkill or a similar tool to send RST packets on the open connection, using the outgoing port for the client as the target, so that you only kill one connection, and not any subsequent connections the client creates.
  4. Observe the OPC-UA packets sent immediately after.

Environment

- OS: Fedora 38 (6.6.14), the original issue happened on Windows.
- Environment: Command line
- Runtime: .NET 8.0.101
- Nuget Version: 1.5.373.121
- Component: Opc.Ua.Client
- Server: ConsoleReferenceServer
- Client: ConsoleReferenceClient

Anything else?

Here's a screenshot from wireshark:

image

Note how it takes several seconds after opening a new channel before the client discovers that the channel is bad and recreates it.

einarmo commented 4 months ago

It doesn't look to me like there is any mechanism to recover after a loss of connection at all? Presumably this happening hits here:

https://github.com/OPCFoundation/UA-.NETStandard/blob/0fd58b43f254fbae7d18aeebb8fb6dbc02d54f42/Stack/Opc.Ua.Core/Stack/Tcp/UaSCBinaryClientChannel.cs#L525

but there is no mechanism to communicate back to the client that the channel was reopened, from what I can tell, so the client would only be reactivated when using the SessionReconnectHandler later on. Should there be a mechanism to let the channel inform the client that it must reactivate the session? The current reconnect mechanism in the channel implementation seems to be somewhere between useless and actively harmful.

KircMax commented 4 months ago

I also added some Logs for the Client in the related #2544 that I created which is a duplicate of this one.

audunmidttunsystad commented 3 months ago

Hi @mregen, thank you for getting back to us on this! How do you see the way forward here – and would you have any estimates or updates?

KircMax commented 3 months ago

If you could maybe point out some things where I could do some pre-work for you I'd like to try to help @mregen - currently I am on vacation but afterwards I'd happily try.

mregen commented 3 months ago

Hi @einarmo, @KircMax, @audunmidttunsystad, I was able to reproduce the issue with the sysinternals tcpview tool, close the connection and I see it happening. Thanks for reporting this special case!

I think currently of multiple ways to fix this: i) when we go into reconnect and there is no valid service call in the beginning (e.g. CreateSession, ActivateSession, GetEndpoints etc.) fail the service calls which are currently queued with BadConnectionClosedetc. Proceed only to TCP BeginConnect once a compliant service call is made. ii) when we go into reconnect and there is no valid service call in the beginning (e.g. CreateSession, ActivateSession, GetEndpoints etc.) queue the service calls. If there is a valid service call, queue it in the beginning and start with TCP BeginConnect. If the connect attempt times out, all queued servcie calls time out.

I think the latter option might be smoother if there is really just a small network interruption. Problem in both cases is that the reconnect handler must trigger a reconnect to get the Activate message sent and queued before the TCP BeginConnect starts.

So better were the reconnect handler is immediately triggered when the socket closes, it may also need some additional wiring. Looking into the fix now...

KircMax commented 2 months ago

Hi @mregen Thanks a lot for tackling this issue - it will be really helpful for short network interruptions Or - to give you another direct use case - when e.g. a failover of a redundant system takes place ;).

Both ways seem like valid fixes to me - From my side I'd leave this up to you to decide, you clearly have more experience and knowledge of the codebase. Thanks again.