Livox-SDK / Livox-SDK2

Drivers for receiving LiDAR data and controlling lidar, support Lidar HAP and Mid-360.
Other
87 stars 56 forks source link

livox_lidar_quick_start crashes on Rasbpian Bulleye 32 #33

Open michalpelka opened 1 year ago

michalpelka commented 1 year ago

I experience random crashes due to UB on SDK example. The crash occurs every time I use livox_lidar_quick_start (it is little more stable when I use SDK). Here is stack for crashed livox_lidar_quick_start:

livox::lidar::GeneralCommandHandler::Init general_command_handler.cpp:86
livox::lidar::DeviceManager::Init device_manager.cpp:126
LivoxLidarSdkInit livox_lidar_sdk.cpp:95
main basic_string.h:186
__libc_start_main 0x00000000f7c89740
_start 0x00000000000397dc

Here is my configuration:

robot@raspberrypi:~ $ uname -a
Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux

Here is std output:

/tmp/tmp.XFmfND2dUl/cmake-build-debug-remote-host/3rd/Livox-SDK2/samples/livox_lidar_quick_start/livox_lidar_quick_start /home/robot/mandeye_controller/3rd/Livox-SDK2/samples/livox_lidar_quick_start/mid360_config.json
Signal: SIGBUS (Bus error)
michalpelka commented 1 year ago

Any update on that issue @Livox-Infra ? It is seems quite serious issue.

neolu commented 1 year ago

Raspberry Pi OS (buster) is ok.

michalpelka commented 1 year ago

For me, it worked as well for some time. After updating the system and recompilation it stops working. It is UB triggered by padding of non-trivial classes. As I said it is a serious bug that causes SDK to be unusable for me as a Livox client. I do not understand why it was not addressed or my PR was not reviewed. The Livox-SDK2 repository is not maintained IHMO.

michalpelka commented 1 year ago

@Livox-Infra @Livox-SDK

JanuszBedkowski commented 1 year ago

I confirm the same issue on rasbian. This is serious blocker.

HViktorTsoi commented 10 months ago

This commit https://github.com/Livox-SDK/Livox-SDK2/pull/34 fixed my bus error issue on raspberry pi 10 (buster).

Thanks @michalpelka

In short, just modify the following two lines of code

https://github.com/Livox-SDK/Livox-SDK2/blob/5c1f7660b6daae2420fc4b8132c1c3f3c12c0756/sdk_core/logger_handler/logger_manager.h#L44 and https://github.com/Livox-SDK/Livox-SDK2/blob/5c1f7660b6daae2420fc4b8132c1c3f3c12c0756/sdk_core/comm/define.h#L40

both to

#pragma pack(4)
michalpelka commented 10 months ago

@Livox-Infra @Livox-SDK could you approve or #34 or the solution proposed by @HViktorTsoi ?

RomanStadlhuber commented 2 weeks ago

@michalpelka thank you for this fix. I am also experiencing the same problem, trying to get livox_ros_driver2 to work Simply trying to refactor the #pragma pack(x) statements did not do the trick. May I ask what is the reasoning behind this fix so I can maybe better understand what is going on in the ROS driver?

michalpelka commented 1 week ago

Ok, @RomanStadlhuber I spent a quite long time on that, and I am more than unhappy that it was closed by maintainers, without fixing.

I was experiencing a crash only with Raspberry Pi with a 32 and 64-bit system. It worked as expected on x86. The crash was quite strange, instead of getting seg-fault we got bus-error. And stack that was not usable. The difference between seg-fault and bus-error is here : https://stackoverflow.com/questions/212466/what-is-a-bus-error-is-it-different-from-a-segmentation-fault

I guess that it is not a bug in Livox-SDK, but rather some bug in the compiler or an advanced example of misuage of alignment. ARM by design has to have instructions aligned to 4 bytes (https://stackoverflow.com/questions/1237963/alignment-along-4-byte-boundaries) - it will stop execution if we ask ARM to run instructions on not aligned address. I did not analyze the assembly, but I am almost sure that pragma pack(1) was also applied globally to classes. Note that using pack(4) will align everything to the accepted 4 bytes alignment for ARMs.

I think that such directives as a change of alignment should be applied to the smallest possible scope, but as I said @Livox-SDK closed my PR without any note https://github.com/Livox-SDK/Livox-SDK2/pull/34 so I can be wrong here, nevertheless, I work happy on my fork.

michalpelka commented 1 week ago

Also what was very strange, it was compile-time undefined behavior - it worked fine (it did not crash randomly), but adding anything (e.g. another variable) in another place in the program that was using the SDK caused persistent crashes.

michalpelka commented 1 week ago

Debugging this issue was quite an adventure and learning experience for me.

RomanStadlhuber commented 5 days ago

@michalpelka Thank you for your answer. I fixed it and it was, in fact, just a single #pragma pack statement instead of both (see Issue for reference). This fix would not be possible if it wasn't for the problem you discovered, many thanks!

michalpelka commented 5 days ago

Glad my investigation helped your case.