PepperlFuchs / pf_lidar_ros_driver

ROS driver for Pepperl+Fuchs R2000 and R2300 laser scanners
https://www.pepperl-fuchs.com/global/en/23097.htm
Apache License 2.0
37 stars 37 forks source link

Persistent protocol errors, probably not terminated socket #124

Open sarguez opened 4 months ago

sarguez commented 4 months ago

Describe the bug Hello, we encountered (in several issues now) a case where probably a socket gets stuck and the device cannot be initialized due to persistent protocol errors. This is what is seen in the log:

[/r2000_node 1708101306.555321]: Device found: R2000 [/r2000_node 1708101306.559998]: protocol error: 120 Invalid handle or no handle provided [/r2000_node 1708101306.568282]: protocol error: 333 Socket couldn't be created: Invalid argument [/r2000_node 1708101306.569719]: Connection refused [/r2000_node 1708101306.569787]: Unable to establish TCP connection [/r2000_node 1708101306.569838]: Unable to initialize device

Remarks:

Since this is solved by powercycling the computer, it appears to me that it is some kind of lingering socket problem. One thing we didn't try is waiting for +2 minutes in the hopes of kernel cleaning up the socket itself.

Another Finding: We encountered this issue several times. In some of the cases, if we scroll up in the logs to the beginning of the issue, we see a Recv failure error. After spamming this log for some time, it gets into the state mentioned above after restarting the node. Maybe this can give an idea about the root cause. [/r2000_node 1708101240.116282]: HTTP ERROR: Empty reply from server [/r2000_node 1708101240.118622]: HTTP ERROR: Recv failure: Connection reset by peer [/r2000_node 1708101240.119891]: HTTP ERROR: Recv failure: Connection reset by peer [/r2000_node 1708101240.121579]: HTTP ERROR: Recv failure: Connection reset by peer [/r2000_node 1708101240.123527]: HTTP ERROR: Recv failure: Connection reset by peer

Environment (please complete the following information): OS: Ubuntu 20.04.06 LTS ROS Version: ROS Noetic

Sensor Device: R2000 FW Version:"1.62" HW Version:"1.72"

Additional context We build the commit: 682a1fb965ee16a0b3f2646fed288c1f969ad7ba

sarguez commented 4 months ago

I am not sure but I think you need the SO_REUSEADDR option in your sockets to deal with this.

Currently, we have to powercycle the entire robot (which requires physical access) to fix this problem. It would be a huge improvement if powercycling just the scanner worked (we can do this remotely.) It seems to me that the connection is still refused when the scanner is restarted, because the socket on the computer side doesn't have this reuse address option. (or maybe another similar option)

It can be set with something like this using boost sockets.

    boost::asio::socket_base::reuse_address option(true);
    socket.set_option(option);