PepperlFuchs / pf_lidar_ros_driver

ROS driver for Pepperl+Fuchs R2000 and R2300 laser scanners
https://www.pepperl-fuchs.com/global/en/23097.htm
Apache License 2.0
37 stars 37 forks source link

Segfault running the latest ROS2 driver #118

Closed jncfa closed 9 months ago

jncfa commented 9 months ago

Using the latest code from the porting-ros2 branch, the driver crashes shortly after connecting to the R2000 lidar, regardless of what parameters are used:

> ros2 launch pf_driver r2000.launch.py
[INFO] [launch]: All log files can be found below /home/jncfa/.ros/log/2023-10-10-16-01-26-028856-dev-74501
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [ros_main-1]: process started with pid [74503]
[ros_main-1] 1696946486.237305274 [ros_main] [INFO]: device name: R2000
[ros_main-1] 1696946486.237408694 [ros_main] [INFO]: transport_str: udp
[ros_main-1] 1696946486.237436509 [ros_main] [INFO]: scanner_ip: 192.168.123.60
[ros_main-1] 1696946486.237461629 [ros_main] [INFO]: port: 
[ros_main-1] 1696946486.237482980 [ros_main] [INFO]: start_angle: -1800000
[ros_main-1] 1696946486.237503971 [ros_main] [INFO]: max_num_points_scan: 0
[ros_main-1] 1696946486.237524327 [ros_main] [INFO]: watchdogtimeout: 60000
[ros_main-1] 1696946486.237543790 [ros_main] [INFO]: watchdog: 1
[ros_main-1] 1696946486.237566533 [ros_main] [INFO]: num_layers: 0
[ros_main-1] 1696946486.237586791 [ros_main] [INFO]: topic: /scan
[ros_main-1] 1696946486.237606490 [ros_main] [INFO]: frame_id: scanner_link
[ros_main-1] 1696946486.237626227 [ros_main] [INFO]: packet_type: C
[ros_main-1] 1696946486.237646509 [ros_main] [INFO]: apply_correction: 0
[ros_main-1] 1696946486.237757514 [ros_main] [INFO]: start_angle: -1800000
[ros_main-1] 1696946486.269807890 [ros_main] [INFO]: Device found: R2000
[ros_main-1] 1696946486.303548547 [ros_main] [INFO]: Device state changed to Initialized
[ros_main-1] 1696946486.314785333 [ros_main] [INFO]: Device state changed to Running
[ERROR] [ros_main-1]: process has died [pid 74503, exit code -11, cmd '/home/jncfa/ros2_ws/install/pf_driver/lib/pf_driver/ros_main --ros-args -r __node:=ros_main --params-file /home/jncfa/ros2_ws/install/pf_driver/share/pf_driver/config/r2000_params.yaml'].

The stack trace indicates the segfault is here: https://github.com/PepperlFuchs/pf_lidar_ros_driver/blob/056e9c183fd45c20d9d8039e33509dd1cc50a4ae/src/pf_driver/src/pf/pipeline.cpp#L113

#0  0x00007ffff7d838eb in ?? () from target:/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
#1  0x00007ffff7d0c606 in ?? () from target:/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
#2  0x00007ffff7d88ce2 in operator delete(void*, unsigned long) () from target:/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
#3  0x00005555555a57d9 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fffe9e00000) at /usr/include/c++/9/bits/shared_ptr_base.h:148
#4  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fffe9e00000) at /usr/include/c++/9/bits/shared_ptr_base.h:148
#5  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffeb7fcb48, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#6  std::__shared_ptr<PFPacket, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffeb7fcb40, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#7  std::shared_ptr<PFPacket>::~shared_ptr (this=0x7fffeb7fcb40, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
#8  Pipeline::run_reader (this=0x7ffff5b9ec00) at pf_r2000_driver/pf_driver/src/pf/pipeline.cpp:113
#9  0x00007ffff75eadf4 in ?? () from target:/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff77b4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007ffff72d7133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
hsd-dev commented 9 months ago

@jncfa thanks for the issue. Surprising though since both me and @ptruka have tested it. I don't have access to the device, but I will go to office this week and test again. In the mean time could you please tell us your distro of Ubuntu and ROS, and also the firmware of the device you are using?

jncfa commented 9 months ago

Info from the LiDAR:

"product":"OMD30M-R2000-B23-V1V1D-HD-1L",
"revision_fw":"1.60",
"revision_hw":"1.62",

We had this issue on ROS2 Galactic using Ubuntu 20.04.6 LTS. In the mean time I can give you a few more pointers for this issue*:

hsd-dev commented 9 months ago

Thanks a lot for the pointers @jncfa! Tomorrow I will get hold of the device and test it.

jncfa commented 9 months ago

Hi @ipa-hsd, do you have any updates about this issue?

hsd-dev commented 9 months ago

@jncfa I have the exact same configuration as you:

I ran the driver for the whole day yesterday without problems. Without recreating the error, it is difficult to debug it. I will spend some time today to think what might be causing this. In the mean time, if you have more ideas, please let me know.

hsd-dev commented 9 months ago

I doubt anything is going wrong in run_read. The packet is read in this function https://github.com/PepperlFuchs/pf_lidar_ros_driver/blob/df8047e2b4f075032a42fa07074d09e13a9e089c/src/pf_driver/src/ros/pf_data_publisher.cpp#L54 and as you see it is never explicitly free'd.

Maybe something is going wrong while parsing the packets? here is where it happens https://github.com/PepperlFuchs/pf_lidar_ros_driver/blob/df8047e2b4f075032a42fa07074d09e13a9e089c/src/pf_driver/include/pf_driver/pf/pf_parser.h#L14-L46

jncfa commented 9 months ago

I'll try to create a minimal Dockerfile that can emulate our system for this case Testing with gcc-10 also creates the segfault in the same place

The problem doesn't seem to be in run_reader, but rather is probably in the packet parsing or in the queue handling If you remove the reader->read() and replace it with just a packet.reset() it will also segfault.

jncfa commented 9 months ago

I think I've found out the problem, because PFPacket doesn't have a virtual destructor you get undefined behaviour when destroying any derived packet The packets are created in the PFParser::parse() depending on the packet type, but the destructor that seems to be always called is PFPacket::~PFPacket()

hsd-dev commented 9 months ago

Could you please implement the destructor for the packet type you are using and test it if it solves the problem? If it does, I can implement for the remaining of the packets. Thanks for looking into this!

jncfa commented 9 months ago

You can just declare a virtual destructor PFPacket like this: virtual ~PFPacket() = default;

because all derived classes with inherit the virtualness of the destructor regardless I don't think you need to explicitly define the destructor, you can let the compiler write it for you

This at least fixed the segfault for me

hsd-dev commented 9 months ago

That's great to hear! Could you please make a PR against https://github.com/PepperlFuchs/pf_lidar_ros_driver/tree/porting-ros2 branch?

hsd-dev commented 9 months ago

@jncfa could you please confirm if this is what you meant? https://github.com/PepperlFuchs/pf_lidar_ros_driver/pull/83/commits/f96eba624beeba9120644e20b234e7266e26f8ae

jncfa commented 9 months ago

@jncfa could you please confirm if this is what you meant? f96eba6

Yes, that should be it!

I also ended up doing that for all other base classes that are used on the driver (Reader, Writer, etc..), among some other things, I'll try to see if I can get these changes also pushed here

hsd-dev commented 9 months ago

I'll try to see if I can get these changes also pushed here

That would be great! Thanks for your contribution

hsd-dev commented 9 months ago

@jncfa I will close this issue now since you confirmed that your fix solves the issue.

I also ended up doing that for all other base classes that are used on the driver (Reader, Writer, etc..),

Feel free to make a PR against porting-ros2 branch whenever you can