PepperlFuchs / pf_lidar_ros_driver

ROS driver for Pepperl+Fuchs R2000 and R2300 laser scanners
https://www.pepperl-fuchs.com/global/en/23097.htm
Apache License 2.0
36 stars 37 forks source link

UDP bind error: Address family not supported by protocol #126

Open andreucm opened 1 month ago

andreucm commented 1 month ago

Hi ,

Context

Host Machine:

Device:

Ping

ISSUE

From the host machine, I do:

  1. Edit the pf_lidar_ros_driver/src/pf_driver/config/r2000_params.yaml as:
    R2000:
    transport: "udp"
    scanner_ip: "192.168.2.3"
    port: "3000" # optional
    device: "R2000"
    start_angle: -1800000
    max_num_points_scan: 0
    packet_type: "C"
    scan_topic: "/scan"
    frame_id: "scanner_link"
    watchdogtimeout: 60000
    watchdog: true
  2. roslaunch pf_driver r2000.launch
  3. and I get:
    
    ... logging to /home/beta/.ros/log/67512ddc-1357-11ef-962f-9cb6d0e16137/roslaunch-maupas-237492.log
    Checking log directory for disk usage. This may take a while.
    Press Ctrl-C to interrupt
    Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://127.0.0.1:42357/

SUMMARY

PARAMETERS

NODES / r2000_node (pf_driver/ros_main) robot_state_publisher (robot_state_publisher/robot_state_publisher) rviz (rviz/rviz)

ROS_MASTER_URI=http://127.0.0.1:11311

process[robot_state_publisher-1]: started with pid [237508] process[r2000_node-2]: started with pid [237509] process[rviz-3]: started with pid [237510] QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-beta' [ INFO] [/r2000_node]: Device found: R2000 terminate called after throwing an instance of 'boost::wrapexcept' what(): bind: Address family not supported by protocol [r2000_node-2] process has died [pid 237509, exit code -6, cmd /home/beta/ros_ws/install/lib/pf_driver/ros_main __name:=r2000_node __log:=/home/beta/.ros/log/67512ddc-1357-11ef-962f-9cb6d0e16137/r2000_node-2.log]. log file: /home/beta/.ros/log/67512ddc-1357-11ef-962f-9cb6d0e16137/r2000_node-2*.log



It seems there is a problem at socket binding, but the provided IP address seems to be correct v4, so I do not understand the message "Address family not supported by protocol".

Any help will be highly appreciated, 

Thanks, 

Andreu
ptruka commented 1 month ago

Hi,

I just tested it on my machine with nearly the same setup:

The r2000 yaml is also set to the scanners IP, transport protocol is set to udp. I tested 2 scenarios:

In both cases the driver started successfully.


I got the following Questions:

  1. Are there multiple network adapters on the Host machine? (If yes, are they in the same subnet?)
  2. Which Lidar model is used?

In order to get a better understanding what is happening on your side, I would add a print statement to find out what is going on:

In the file: src/pf_driver/src/pf/http_helpers/curl_resource.cpp

add the following into line #37 std::cout << "Request URL: " << url_ << std::endl; // Print the request URL

Like this:

void CurlResource::get(Json::Value& json_resp)
{
  std::cout << "Request URL: " << url_ << std::endl;  // Print the request URL
  request_.setOpt(curlpp::options::Url(url_));
  request_.perform();

  Json::Reader reader;
  reader.parse(response_, json_resp);
}

Then the you would need to build the driver again (catkin build). After the driver is launched, we will be able to see all the requests that are sent from the host to the scanner.

andreucm commented 1 month ago

Hi Patrick,

thanks for the support.

I answer your questions:

  1. About network adapters, I copy the output of ifconfig. You will see that all the adapters are in separate subnets, and that I'm working on docker. Please recall I have correct ping with the lidar at 192.168.2.3
    
    docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:d0:a7:1e:66  txqueuelen 0  (Ethernet)

enx44ed57000019: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.2.100 netmask 255.255.255.0 broadcast 192.168.2.255 inet6 fe80::8c7e:3819:699a:9886 prefixlen 64 scopeid 0x20 ether 44:ed:57:00:00:19 txqueuelen 1000 (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback)

tun0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 1500 inet 10.8.0.14 netmask 255.255.255.255 destination 10.8.0.13 inet6 fe80::262c:7e33:f1d7:f61c prefixlen 64 scopeid 0x20 unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 100 (UNSPEC)

wlp58s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.71 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::ba5a:e301:fc5d:f3e4 prefixlen 64 scopeid 0x20 ether 9c:b6:d0:e1:61:37 txqueuelen 1000 (Ethernet)


2. The lidar model is: OMD30M-R2000-B23-V1V1D-HD-1L (edited the first entry of this issue to also add this info)

3. Print at line 37 of curl_resource.cpp: 
Nothing appears... According other print statements I added,  I think the error appears before in run time, at line:  https://github.com/PepperlFuchs/pf_lidar_ros_driver/blob/88bebcc7a4686caa6354ea6154057bce60945de1/src/pf_driver/src/communication/udp_transport.cpp#L12. 

Just before of that line, I have a print statement like 

std::cout << "address, port: " << address << ", " << port << std::endl;

which results in runtime: 

address, port: 192.168.2.3, 3000 terminate called after throwing an instance of 'boost::wrapexcept' what(): bind: Address family not supported by protocol [r2000_node-2] process has died [pid 33311, exit code -6, cmd /home/beta/ros_ws/install/lib/pf_driver/ros_main __name:=r2000_node __log:=/home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2.log]. log file: /home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2*.log


So the error seems to appear before stablishing any connection, even if the statement 

[ INFO] [/r2000_node]: Device found: R2000


appears  just before the crash. 
andreucm commented 1 month ago

Hi Patrick!

Setting transport to "tcp" works ok ...

Since I guess data transfer through tcp adds some extra traffic. please, let me know why "udp" is the default transport setting and which implications may have to set it to "tcp"

THanks

Andreu

ptruka commented 1 month ago

Hi,

I just added your print statement to compare the output.

This is my yaml:

R2000:
  transport: "udp"
  scanner_ip: "192.168.50.153"
  port: "3000"
  device: "R2000"
  start_angle: -1800000
  max_num_points_scan: 0
  packet_type: "C"
  scan_topic: "/scan"
  frame_id: "scanner_link"
  watchdogtimeout: 60000
  watchdog: true

My host IP is 192.168.50.174

Besides your print statement, I also added one for the local_endpoint:

UDPTransport::UDPTransport(std::string address, std::string port) : Transport(address, transport_type::udp)
{
  std::cout << "address, port " << address << ", " << port << std::endl;

  io_service_ = std::make_shared<boost::asio::io_service>();
  udp::endpoint local_endpoint = local_endpoint = udp::endpoint(udp::v4(), stoi(port));

  std::cout << "Local Endpoint IP, Port: " << local_endpoint.address().to_string() << ", " << local_endpoint.port() << std::endl;

  socket_ = std::make_unique<udp::socket>(*io_service_, local_endpoint);
  timer_ = std::make_shared<boost::asio::deadline_timer>(*io_service_.get());
}

The driver is starting, this is the log:


roslaunch pf_driver r2000.launch 
... logging to /home/ubuntu/.ros/log/5b76cdb4-175d-11ef-b750-c52d5417674f/roslaunch-ubuntu-7597.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://ubuntu:36695/

SUMMARY
========

PARAMETERS
 * /r2000_node/R2000/device: R2000
 * /r2000_node/R2000/frame_id: scanner_link
 * /r2000_node/R2000/max_num_points_scan: 0
 * /r2000_node/R2000/packet_type: C
 * /r2000_node/R2000/port: 3000
 * /r2000_node/R2000/scan_topic: /scan
 * /r2000_node/R2000/scanner_ip: 192.168.50.153
 * /r2000_node/R2000/start_angle: -1800000
 * /r2000_node/R2000/transport: udp
 * /r2000_node/R2000/watchdog: True
 * /r2000_node/R2000/watchdogtimeout: 60000
 * /r2000_node/device: R2000
 * /robot_description: <?xml version="1....
 * /rosdistro: noetic
 * /rosversion: 1.16.0

NODES
  /
    r2000_node (pf_driver/ros_main)
    robot_state_publisher (robot_state_publisher/robot_state_publisher)
    rviz (rviz/rviz)

auto-starting new master
process[master]: started with pid [7607]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 5b76cdb4-175d-11ef-b750-c52d5417674f
process[rosout-1]: started with pid [7617]
started core service [/rosout]
process[robot_state_publisher-2]: started with pid [7624]
process[r2000_node-3]: started with pid [7625]
process[rviz-4]: started with pid [7626]
Request URL: http://192.168.50.153/cmd/get_protocol_info
Request URL: http://192.168.50.153/cmd/get_parameter?list=device_family
Request URL: http://192.168.50.153/cmd/get_parameter?list=product
[ INFO] [1716287521.480984789]: Device found: R2000
address, port 192.168.50.153, 3000
Local Endpoint IP, Port: 0.0.0.0, 3000
Request URL: http://192.168.50.153/cmd/request_handle_udp?address=192.168.50.174&packet_type=C&port=3000
Request URL: http://192.168.50.153/cmd/set_scanoutput_config?handle=s39961980&max_num_points_scan=0&packet_type=C&skip_scans=0&start_angle=-1800000&watchdog=on&watchdogtimeout=60000
Request URL: http://192.168.50.153/cmd/get_parameter?list=angular_fov;radial_range_min;radial_range_max;scan_frequency
[ INFO] [1716287521.552085818]: Device state changed to Initialized
Request URL: http://192.168.50.153/cmd/start_scanoutput?handle=s39961980
[ INFO] [1716287521.571733776]: Device state changed to Running

If you could also post the whole log from console, we could compare the what is happening. Remark: I see 3 "Requested URL" prints before the "address, port" print. This should be the same on your side.

ptruka commented 1 month ago

Hi Patrick!

Setting transport to "tcp" works ok ...

Since I guess data transfer through tcp adds some extra traffic. please, let me know why "udp" is the default transport setting and which implications may have to set it to "tcp"

THanks

Andreu

Nice to hear that tcp is working. I would still like to find the issue that you are facing when using udp as protocol. Of cause the tcp will add a little bit more traffic due to the protocol itself. The protocol selected depends on the users application. If you want the data as quickly as possible and you are only interested in live data, then udp is the better choice. If the priority is more on the completeness of the data, tcp should be used, due to the completeness checking mechanisms etc. used in the protocol.

andreucm commented 1 month ago

Hi
yes it is good to have tcp working because we can unblock our development, but as you point, we'd prefer to work with udp, so let's try to fix the thing...

The print statement of the Local Endpoint seems very suspicious:

[ INFO] [/r2000_node]: Device found: R2000
address, port: 192.168.2.3, 3000
Local Endpoint IP, Port: 19:9cbf:759:ed4f:800c:e41c:a57f:0%3180489729, 11801
terminate called after throwing an instance of 'boost::wrapexcept<boost::system::system_error>'
  what():  bind: Address family not supported by protocol
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-beta'
[r2000_node-2] process has died [pid 75951, exit code -6, cmd /home/beta/ros_ws/install/lib/pf_driver/ros_main __name:=r2000_node __log:=/home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2.log].
log file: /home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2*.log
ptruka commented 1 month ago

it is really strange that the local enpoint is set to an IPv6 address. could you try the following:

replace this line in udp_transport.cpp: udp::endpoint local_endpoint = local_endpoint = udp::endpoint(udp::v4(), stoi(port)); with udp::endpoint local_endpoint = local_endpoint = udp::endpoint(boost::asio::ip::address_v4::any(), stoi(port));

andreucm commented 1 month ago

Similar output with boost::asio::ip::address_v4::any() :

Request URL: http://192.168.2.3/cmd/get_protocol_info
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-beta'
Request URL: http://192.168.2.3/cmd/get_parameter?list=device_family
Request URL: http://192.168.2.3/cmd/get_parameter?list=product
[ INFO] [/r2000_node]: Device found: R2000
address, port: 192.168.2.3, 3000
Local Endpoint IP, Port: e6:bfb5:cfc5:66ec:803c:bc95:747f:0%892741889, 1682
terminate called after throwing an instance of 'boost::wrapexcept<boost::system::system_error>'
  what():  bind: Address family not supported by protocol
[r2000_node-2] process has died [pid 132670, exit code -6, cmd /home/beta/ros_ws/install/lib/pf_driver/ros_main __name:=r2000_node __log:=/home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2.log].
log file: /home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2*.log
ptruka commented 1 month ago

Is it possible that the host machine is preferring IPv6 or using dualstack?

I would check the IPv6 settings and try to disable it if it is not

sysctl net.ipv6.conf.all.disable_ipv6
sysctl net.ipv6.conf.default.disable_ipv6
sysctl net.ipv6.conf.all.forwarding
andreucm commented 1 month ago

My output:

beta@maupas:~$ $sysctl net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 0
beta@maupas:~$ $sysctl net.ipv6.conf.default.disable_ipv6
net.ipv6.conf.default.disable_ipv6 = 0
beta@maupas:~$ $sysctl net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 0

Seems everything related to ipv6 is set to false

ptruka commented 1 month ago

I think this shows that it is enabled. Could you run those commands to disable ipv6? Afterwards I would reconnect the sensor and check the driver again. (To disable it persistently, refer to https://askubuntu.com/questions/1300253/disable-ipv6-on-ubuntu-20-04)

sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.all.forwarding=1
andreucm commented 1 month ago
  1. Setting the three ipv6.conf params to 1
  2. Switch off/on interface
  3. run the pf_driver with the same error
  4. Check after running the pf_driver that ipv6.conf params are still to 1 (just in case)
ptruka commented 1 month ago

Thank you for testing those settings.

From what we have seen the issue is coming from the local_endpoint. I will discuss this with @ipa-vsp to track down why you are receiving IPv6 address for your local endpoint.

andreucm commented 1 month ago

ok! thanks, let me know if you need some testing from our side. Meanwhile we are working with tcp transport type.

thanks!

ptruka commented 1 month ago

As @ipa-vsp is not available this week I would try this last idea:

UDPTransport::UDPTransport(std::string address, std::string port) : Transport(address, transport_type::udp)
{
  std::cout << "address, port " << address << ", " << port << std::endl;

  io_service_ = std::make_shared<boost::asio::io_service>();
  //udp::endpoint local_endpoint = local_endpoint = udp::endpoint(udp::v4(), stoi(port));

  boost::asio::ip::address_v4 ipv4_any = boost::asio::ip::address_v4::any();
  // alternatively force to use the 0.0.0.0 address instead of using the any() function:
  // boost::asio::ip::address_v4 ipv4_any(0x00000000); 
  std::cout << "IPv4 any address: " << ipv4_any.to_string() << std::endl;

  udp::endpoint local_endpoint = local_endpoint = udp::endpoint(ipv4_any, stoi(port));
  std::cout << "Local Endpoint IP, Port: " << local_endpoint.address().to_string() << ", " << local_endpoint.port() << std::endl;

  socket_ = std::make_unique<udp::socket>(*io_service_, local_endpoint);
  timer_ = std::make_shared<boost::asio::deadline_timer>(*io_service_.get());
}

With the definition of ipv4_any we should receive the 0.0.0.0 address for your local endpoint. I tested the code on my side, which is still working:

[...]
started core service [/rosout]
process[robot_state_publisher-2]: started with pid [4765]
process[r2000_node-3]: started with pid [4766]
process[rviz-4]: started with pid [4770]
Request URL: http://192.168.50.153/cmd/get_protocol_info
Request URL: http://192.168.50.153/cmd/get_parameter?list=device_family
Request URL: http://192.168.50.153/cmd/get_parameter?list=product
[ INFO] [1716461862.107834259]: Device found: R2000
address, port 192.168.50.153, 3000
IPv4 any address: 0.0.0.0
Local Endpoint IP, Port: 0.0.0.0, 3000
Request URL: http://192.168.50.153/cmd/request_handle_udp?address=192.168.50.174&packet_type=C&port=3000
Request URL: http://192.168.50.153/cmd/set_scanoutput_config?handle=s84661416&max_num_points_scan=0&packet_type=C&skip_scans=0&start_angle=-1800000&watchdog=on&watchdogtimeout=60000

If this still does not fix the issue, it would be helpful to provide the full log of the ros console.

Thanks in advance

andreucm commented 1 month ago

Hi! One bad news and one good news:

  1. The bad is your piece of code does not work. the reported output was similar.

  2. the good is that I've found the bug. The problem is in the line:

    udp::endpoint local_endpoint = local_endpoint = udp::endpoint(ipv4_any, stoi(port));

    which is something I felt weird ... and effectively it is risky ,since depending on what is executing first, things can be wrong. It is like : int a = a = int(67);

Split it better as:

udp::endpoint local_endpoint;
local_endpoint = udp::endpoint(ipv4_any, stoi(port));

or , second option, in one line:

udp::endpoint local_endpoint(ipv4_any, stoi(port));

and everything works !

You may come back to the initial version and just fix that line.

Please let me know when you fix it to the main branch.

Thanks for the support