autowarefoundation / autoware.universe

https://autowarefoundation.github.io/autoware.universe/
Apache License 2.0
882 stars 570 forks source link

ndt_scan_matcher sometimes crashes when performing dynamic_map_loading #5973

Closed SakodaShintaro closed 4 months ago

SakodaShintaro commented 6 months ago

Checklist

Description

The ndt_scan_matcher occasionally crashes when performing dynamic_map_loading. This seems to be related to memory corruption caused by a data race, leading to intermittent and varied error messages.

Typical Error Log:

1703491838.3898816 [ndt_scan_matcher-36] malloc(): corrupted top size
1703491838.3900959 [ndt_scan_matcher-36] *** Aborted at 1703491838 (unix time) try "date -d @1703491838" if you are using GNU date ***
1703491838.3910542 [ndt_scan_matcher-36] PC: @                0x0 (unknown)
1703491838.3993685 [ndt_scan_matcher-36] *** SIGABRT (@0x3e8000c92b1) received by PID 823985 (TID 0x7ff4caffd640) from PID 823985; stack trace: ***
1703491838.4001207 [ndt_scan_matcher-36]     @     0x7ff505764046 (unknown)
1703491838.4007185 [ndt_scan_matcher-36]     @     0x7ff504850520 (unknown)
1703491838.4013429 [ndt_scan_matcher-36]     @     0x7ff5048a49fc pthread_kill
1703491838.4018614 [ndt_scan_matcher-36]     @     0x7ff504850476 raise
1703491838.4024291 [ndt_scan_matcher-36]     @     0x7ff5048367f3 abort
1703491838.4029987 [ndt_scan_matcher-36]     @     0x7ff504897676 (unknown)
1703491838.4035871 [ndt_scan_matcher-36]     @     0x7ff5048aecfc (unknown)
1703491838.4041619 [ndt_scan_matcher-36]     @     0x7ff5048b26f2 (unknown)
1703491838.4046745 [ndt_scan_matcher-36]     @     0x7ff5048b3139 malloc
1703491838.4053230 [ndt_scan_matcher-36]     @     0x7ff504bcb98c operator new()
1703491838.4060459 [ndt_scan_matcher-36]     @     0x563c800d01bc pcl::fromROSMsg<>()
1703491838.4064028 [ndt_scan_matcher-36]     @     0x563c800e49ad MapUpdateModule::update_ndt()
1703491838.4064648 [ndt_scan_matcher-36]     @     0x563c800e772b MapUpdateModule::update_map()
1703491838.4067962 [ndt_scan_matcher-36]     @     0x563c80063203 NDTScanMatcher::callback_timer()
1703491838.4068553 [ndt_scan_matcher-36]     @     0x563c8006b6d5 rclcpp::GenericTimer<>::execute_callback()
1703491838.4074447 [ndt_scan_matcher-36]     @     0x7ff5059cff61 rclcpp::Executor::execute_any_executable()
1703491838.4081287 [ndt_scan_matcher-36]     @     0x7ff5059d727a rclcpp::executors::MultiThreadedExecutor::run()
1703491838.4088094 [ndt_scan_matcher-36]     @     0x7ff504bf9253 (unknown)
1703491838.4094012 [ndt_scan_matcher-36]     @     0x7ff5048a2ac3 (unknown)
1703491838.4100635 [ndt_scan_matcher-36]     @     0x7ff504934660 (unknown)
1703491838.4975879 [ERROR] [ndt_scan_matcher-36]: process has died [pid 823985, exit code -6, cmd '/home/shintarosakoda/autoware/install/ndt_scan_matcher/lib/ndt_scan_matcher/ndt_scan_matcher --ros-args -r __node:=ndt_scan_matcher -r __ns:=/localization/pose_estimator -p use_sim_time:=True -p wheel_radius:=0.383 -p wheel_width:=0.235 -p wheel_base:=2.79 -p wheel_tread:=1.64 -p front_overhang:=1.0 -p rear_overhang:=1.1 -p left_overhang:=0.128 -p right_overhang:=0.128 -p vehicle_height:=2.5 -p max_steer_angle:=0.7 --params-file /home/shintarosakoda/autoware/install/autoware_launch/share/autoware_launch/config/localization/ndt_scan_matcher.param.yaml -r points_raw:=/localization/util/downsample/pointcloud -r ekf_pose_with_covariance:=/localization/pose_twist_fusion_filter/biased_pose_with_covariance -r pointcloud_map:=/map/pointcloud_map -r ndt_pose:=/localization/pose_estimator/pose -r ndt_pose_with_covariance:=/localization/pose_estimator/pose_with_covariance -r regularization_pose_with_covariance:=/sensing/gnss/pose_with_covariance -r trigger_node_srv:=trigger_node -r pcd_loader_service:=/map/get_differential_pointcloud_map'].

Since this is caused by memory corruption due to a data race, it may fall as a different error.

Expected behavior

The ndt_scan_matcher should operate without crashing.

Actual behavior

The ndt_scan_matcher crashes with low probability.

Steps to reproduce

This problem is more likely to occur when dynamic map loading and pose estimation by NDT are performed at the same time. Therefore, the config example to reproduce is

  1. Prepare a divided map https://github.com/MapIV/pointcloud_divider
  2. use_dynamic_map_loading: set to true (default) https://github.com/autowarefoundation/autoware_launch/blob/805756256ce392c1a7ef300251acae85a7f4d76a/autoware_launch/config/localization/ndt_scan_matcher.param.yaml#L4
  3. Change parameters to increase NDT load

This will make it easier to reproduce.

Versions

Possible causes

See https://github.com/autowarefoundation/autoware.universe/pull/5951

Additional context

This issue has been fixed in the pull request below. https://github.com/autowarefoundation/autoware.universe/pull/5951

However, there is a problem that the mutex locking time is long (about 20msec to 40msec) when loading a dynamic map, so we plan to improve it in the future.

stale[bot] commented 4 months ago

This pull request has been automatically marked as stale because it has not had recent activity.

SakodaShintaro commented 4 months ago

This issue was also resolved in terms of locking time through the following pull requests.