S7NetPlus / s7netplus

S7.NET+ -- A .NET library to connect to Siemens Step7 devices
MIT License
1.29k stars 577 forks source link

Cancellation sometimes closes connection to PLC #529

Open ltjax opened 3 months ago

ltjax commented 3 months ago

We were sometimes seeing the Plc connection being closed when cancelling a WriteBytesAsync/ReadBytesAsync which then lead to random ObjectDisposedExceptions as in https://github.com/S7NetPlus/s7netplus/pull/528

I traced this back to this line in NoLockRequestTsduAsync: using var closeOnCancellation = cancellationToken.Register(Close); I guess this is because the WriteAsync/ReadAsync query-response pair in NoLockRequestTsduAsync cannot easily be split, but I still found this behavior surprising. Right now, our workaround is not to pass in the CancellationToken at all and just hope that each query is fast enough. Is there a better way of handling this? Maybe only cancelling the task that the user is waiting on but not the query itself once it is running?

mycroes commented 3 months ago

Hi @ltjax,

The issue is one request corresponds to one response. As soon as the request has been started the only guarantee of keeping this one-request-one-response in sync is to wait for the response to arrive. Since cancellation happens somewhere midway the connection is now in a broken state, so there's no sense in keeping it open either.

The alternative way to implement this would be by having a single receiver that would always receive, regardless of the requests being sent and keeping track of request ID's there. That's certainly a viable solution, it would take some work to adapt the library though.

In all honesty I think the issue should very rarely happen unless the cancellation is happening really soon, because on average requests take only milliseconds to complete. Is this also the case with your usage?

ltjax commented 3 months ago

You could also sequence each query on some kind of event-reactor and delegate all queries there. If a user request is cancelled, that just sets the corresponding task to cancelled but the event-loop finishes the query. Each individual request can then be cancelled swiftly at the expense of a very slight delay of the next one. You can spin up an extra task for the reactor in OpenAsync and cancel that swiftly on Close or Dispose. But I agree, the duration of the requests will rarely be a problem, unless something goes wrong with the network during a request.

This solution can, of course, be build on top of the library, but it should probably be at least clearly documented that cancellation can close the connection.

Another option would be to try to recover from 'broken' connections by ignoring everything until the you find the matching request. Not entirely sure that that is possible in this case, since you need to be able to detect message boundaries in the stream.

mycroes commented 3 months ago

You could also sequence each query on some kind of event-reactor and delegate all queries there. If a user request is cancelled, that just sets the corresponding task to cancelled but the event-loop finishes the query. Each individual request can then be cancelled swiftly at the expense of a very slight delay of the next one. You can spin up an extra task for the reactor in OpenAsync and cancel that swiftly on Close or Dispose.

I think you're explaining what I was describing. Keep in mind that only a limited number of requests may be active concurrently, that's not something S7NetPlus can deal with at all right now (see Sally7 for a library that does support that).

But I agree, the duration of the requests will rarely be a problem, unless something goes wrong with the network during a request.

When it's a network failure cancellation has no influence either (unless the network failure has not been detected).

Another option would be to try to recover from 'broken' connections by ignoring everything until the you find the matching request. Not entirely sure that that is possible in this case, since you need to be able to detect message boundaries in the stream.

It's possible, the top level consist of TPKT messages. However with the current implementation it would already be an error condition if multiple messages are received on a single request.