Open smalik007 opened 1 year ago
IMO this is desired behavior. You can use the published diagnostics for monitoring the health of the driver.
@reinzor, Maybe I was not able to describe the problem correctly. The issue is if Scanners gets disconnected and reconnects back, the node becomes silent about it. It does not auto recover own its own. I get your point that we can use diagnostics for checking the health status, but my argument for it is that the node is not doing anything if the Lidar disconnect, It would be rather good if the node exit rather than being just idle. I can see there's been some discussion about this in one of the closed issue : https://github.com/SICKAG/sick_safetyscanners/pull/78, but it looks like it never made to the release because the implementation was a quick fix rather than proper handling as @puck-fzi mentions it can be handled properly with some error exceptions and multiple retries. Likewise, I agree with this approach and could spend some time on fixing that.
Hi thanks for raising, as stated this is currently desired behavior. The branch with the quick fix is nothing which will be integrated into the main branch. This was just a quick way of exiting the node if the connection failed. If you are willing to spend time on this, the best scenario I reckon would be configurable amount of retries before the node shuts down with an error message.
Hi
I want to start working on the issue and would like to discuss the approach before implementing and adding a PR here. The problem actually has two parts in it, One is TCP connection and second is the UDP connection. Currently, if on start of the node if Lidar is let say disconnected then the current behavior of the driver is to print TCP error as in the image below and wait indefinitely in sendTelegramAndListenForAnswer
API.
We can fix this by simply changing the doConnect
API and other send and receive APIs for TCP to returns booleans when they fail/print error of unsuccessful TCP connection, and using boolean return to execute further code snippet, thus making the node non-blocking on the above API and retry thedoConnect
again instead.
For the second part of the problem where if we are using only UDP connection for data and no more TCP request after initialization we can put a watchdog monitoring on received packets on UDP handler, then If for a configurable amount of time no UDP packet has been received we change the node state to reconnection and publish reconnection state to a topic or existing driver health, where we try a TCP request with multiple or infinite retries, once we get a response after the Lidar is connected back, we re-establish the UDP connection part to start receiving the UDP data again. This will allow the node to self recover own its own instead of shutting down and restarting again.
Please feel free for the feedback in above approach so that I can start working on it.
PR submitted : https://github.com/SICKAG/sick_safetyscanners/pull/131
Thanks for submitting the PR. I will check it and if everything seems to be working will merge it into the driver
Hi, The package does not support the socket connection failures, it simply just logs it and silently do nothing about it. It would be nice to have