artofnothingness / mppic

MIT License
84 stars 19 forks source link

Segmentation fault with Controller Server caused by Xtensor(?) #71

Closed padhupradheep closed 1 year ago

padhupradheep commented 2 years ago
Thread 16 "controller_serv" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc2ffd700 (LWP 23325)]
0x00007fffe43a4997 in auto xt::reduce_immediate<xt::xreducer_functors<xt::norm_sq<xt::xtensor_container<xt::uvector<double, xsimd::aligned_allocator<double, 32ul> >, 2ul, (xt::layout_type)1, xt::xtensor_expression_tag>&, std::array<unsigned long, 1ul>, std::tuple<xt::evaluation_strategy::immediate_type>, 0>(xt::xtensor_container<xt::uvector<double, xsimd::aligned_allocator<double, 32ul> >, 2ul, (xt::layout_type)1, xt::xtensor_expression_tag>&, std::array<unsigned long, 1ul>&&, std::tuple<xt::evaluation_strategy::immediate_type>)::{lambda(double const&, double const&)#1}, xt::const_value<double>, std::plus<double> >, xt::xtensor_container<xt::uvector<double, xsimd::aligned_allocator<double, 32ul> >, 2ul, (xt::layout_type)1, xt::xtensor_expression_tag>&, std::array<unsigned long, 1ul>, std::tuple<xt::evaluation_strategy::immediate_type>&>(xt::xtensor_container<xt::uvector<double, xsimd::aligned_allocator<double, 32ul> >, 2ul, (xt::layout_type)1, xt::xtensor_expression_tag>&, std::array<unsigned long, 1ul>&&, std::array<unsigned long, 1ul>&&, std::tuple<xt::evaluation_strategy::immediate_type>&) ()
   from /home/neobotix/mp_400_workspace/install/mppic/lib/libcritics.so

This happens when the controller tries to reach the goal location at a goal angle roughly around >90 degrees from the start orientation. Note that, I have an high weight cost for prefer_forward_critic ...

Parameters are as follows:

FollowPath:
      plugin: "mppi::Controller"
      time_steps: 15
      model_dt: 0.1
      batch_size: 300
      vx_std: 0.2
      vy_std: 0.2
      wz_std: 1.0
      vx_max: 0.5
      vy_max: 0.5
      wz_max: 0.7
      iteration_count: 1
      prune_distance: 1.2
      transform_tolerance: 0.1
      temperature: 0.25
      motion_model: "DiffDrive"
      visualize: false
      AckermannConstrains:
        min_turning_r: 0.
      critics: ["ObstaclesCritic", "GoalCritic", "GoalAngleCritic", "PathAlignCritic", "PathFollowCritic", "PathAngleCritic", "PreferForwardCritic" ]
      GoalCritic:
        enabled: true
        cost_power: 1
        cost_weight: 4.0
      GoalAngleCritic:
        enabled: true
        cost_power: 1
        cost_weight: 3.0
        threshold_to_consider_goal_angle: 0.20
      ObstaclesCritic:
        enabled: true
        cost_power: 2
        cost_weight: 1.25
        consider_footprint: false
        collision_cost: 2000.0
      PathAlignCritic:
        enabled: true
        cost_power: 1
        cost_weight: 2.0
        path_point_step: 1
        trajectory_point_step: 2
      PathFollowCritic:
        enabled: true
        cost_power: 1
        cost_weight: 2.0
        offset_from_furthest: 6
        max_path_ratio: 0.40
      PathAngleCritic:
        enabled: true
        cost_power: 1
        cost_weight: 2.0
        offset_from_furthest: 4
      PreferForwardCritic:
        enabled: true
        cost_power: 1
        cost_weight: 20.0

let me know, if you need more details!

artofnothingness commented 2 years ago

Does this appear only with high PreferForwardCritic weight ?

padhupradheep commented 2 years ago

Yepp!

artofnothingness commented 2 years ago

Probably overflow on costs. On my machine with the same cost on prefer forward it was ok. Need to watch over max cost at each controller iteration

padhupradheep commented 2 years ago

what is the specs of your machine?

We use a standard i5 processor with 4 cores!

artofnothingness commented 2 years ago

Almost the same i5-4200H intel.

I added min velocity parameter in https://github.com/artofnothingness/mppic/pull/74 So you don't need huge weights on prefer forward if you need only forward motion

padhupradheep commented 2 years ago

Alright then, I'll update here, once I give it a swing.

I'll leave it to you, if you want to keep the issue opened (until I test)..

SteveMacenski commented 2 years ago

Are we sure this isn't actually a NaN or Inf issue? I'd like to know specifically what line is segmentation faulting.

I tested it on my i7-8565U without an issue with a gain of 20 and then 100. Interestingly when I set this to a silly degree high, it deviates off the path to beep going forward because it doesn't want to turn around and instead drives forward until a time it can turn itself around using the forward direction elsewhere in the map. If its only high but not insanely high, it'll turn in spot but never reverse. Neat.

But never a segmentation fault

padhupradheep commented 2 years ago

I'm facing this once again and occurs consistently. I didn't do anything with the prefer forward critic this time.

So, I was doing this test as mentioned in #77. When I set the batch size to 1000 and controller frequency to 50 Hz, this error occurs once again. This is a bit longer path compared to the results that I had posted in #77

I'd like to know specifically what line is segmentation faulting.

Here is the ful backtrace

from /home/neobotix/mpo_500_workspace/install/mppic/lib/libcritics.so
#1  0x00007fffe4a5ccee in mppi::critics::PathAlignCritic::score(mppi::CriticData&) () from /home/neobotix/mpo_500_workspace/install/mppic/lib/libcritics.so
#2  0x00007fffe4b30b10 in mppi::CriticManager::evalTrajectoriesScores(mppi::CriticData&) const ()
   from /home/neobotix/mpo_500_workspace/install/mppic/lib/libmppic.so
#3  0x00007fffe4b15673 in mppi::Optimizer::optimize() ()
   from /home/neobotix/mpo_500_workspace/install/mppic/lib/libmppic.so
#4  0x00007fffe4b156e8 in mppi::Optimizer::evalControl(geometry_msgs::msg::PoseS--Type <RET> for more, q to quit, c to continue without paging--
tamped_<std::allocator<void> > const&, geometry_msgs::msg::Twist_<std::allocator<void> > const&, nav_msgs::msg::Path_<std::allocator<void> > const&, nav2_core::GoalChecker*) ()
   from /home/neobotix/mpo_500_workspace/install/mppic/lib/libmppic.so
#5  0x00007fffe4adb66e in mppi::Controller::computeVelocityCommands(geometry_msgs::msg::PoseStamped_<std::allocator<void> > const&, geometry_msgs::msg::Twist_<std::allocator<void> > const&, nav2_core::GoalChecker*) ()
   from /home/neobotix/mpo_500_workspace/install/mppic/lib/libmppic.so
#6  0x00007ffff7cb6148 in nav2_controller::ControllerServer::computeAndPublishVelocity (this=0x555555587e30) at ./src/controller_server.cpp:462
#7  0x00007ffff7cb76e0 in nav2_controller::ControllerServer::computeControl (
    this=0x555555587e30) at ./src/controller_server.cpp:395
#8  0x00007ffff7cce267 in std::function<void ()>::operator()() const (
    this=0x555555671f60) at /usr/include/c++/11/bits/std_function.h:590
#9  nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::work (
    this=0x555555671f00)
    at /opt/ros/humble/include/nav2_util/simple_action_server.hpp:217
#10 0x00007ffff7ceb07c in nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}::operator()() const (
    __closure=<optimized out>)
    at /opt/ros/humble/include/nav2_util/simple_action_server.hpp:205
#11 std::__invoke_impl<void, nav2_util::SimpleActionServer<nav2_msgs::action::Fo--Type <RET> for more, q to quit, c to continue without paging--
llowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}>(std::__invoke_other, nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}&&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#12 std::__invoke<nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}>(nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}&&) (__fn=...)
    at /usr/include/c++/11/bits/invoke.h:96
#13 std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=<optimized out>)
    at /usr/include/c++/11/bits/std_thread.h:253
#14 std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}> >::operator()() (
    this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:260
#15 std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accep--Type <RET> for more, q to quit, c to continue without paging--
ted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}> >, void>::operator()() const (this=<optimized out>) at /usr/include/c++/11/future:1409
#16 std::__invoke_impl<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}> >, void>&>(std::__invoke_other, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}> >, void>&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#17 std::__invoke_r<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}> >, void>&>(std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}> >, void>&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:116
#18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}> >, void> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/11/bits/std_function.h:291
#19 0x00007ffff7eb6e2d in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) () from /opt/ros/humble/lib/librclcpp.so
#20 0x00007ffff7825f68 in __pthread_once_slow (once_control=0x7fffc0007f68, init_routine=0x7ffff7aaedc0 <__once_proxy>) at ./nptl/pthread_once.c:116
#21 0x00007ffff7ceb6f1 in __gthread_once (__func=<optimized out>, __once=0x7fffc0007f68) at /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:700
#22 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) (
    __f=@0x7fffcaffc4e0: (void (std::__future_base::_State_baseV2::*)(std::__future_base::_State_baseV2 * const, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()> *, bool *)) 0x7ffff7eb6e00 <std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)>, __once=...) at /usr/include/c++/11/mutex:783
#23 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) (__ignore_failure=false, __res=..., this=0x7fffc0007f50) at /usr/include/c++/11/future:411
#24 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<nav2_util::SimpleActionServer<nav2_msgs::action::FollowPath>::handle_accepted(std::shared_ptr<rclcpp_action::ServerGoalHandle<nav2_msgs::action::FollowPath> >)::{lambda()#1}> >, void>::_M_run() (this=0x7fffc0007f50)
    at /usr/include/c++/11/future:1737
padhupradheep commented 2 years ago

Also able to reproduce it in simulation.

https://user-images.githubusercontent.com/20242192/176691317-b14a3ee4-befb-415d-9a77-5cd89a7521fd.mp4

SteveMacenski commented 2 years ago

The screen record looks black?

Also, that didn't tell us the line number :cry: We know its somewhere in score! :wink:

padhupradheep commented 2 years ago

I could have messed it up while uploading. I’ll upload it once again tomorrow.

I didn’t get any clue or traces of the line number. That’s why gave in the full back trace.

artofnothingness commented 2 years ago

I can't reproduce error. Can you print some messages on the start / end of critic work in every critic you use, so after the error you can deduce what critic caused that, after that make more prints in the critic that failed, so you can deduce where it came from

SteveMacenski commented 2 years ago

PathAlignCritic::score we know its that from the traceback, I just want to know where in the function that is happening. Prints might just be easiest as @artofnothingness suggests

padhupradheep commented 2 years ago

Okay, will get back to you

padhupradheep commented 2 years ago

simplescreenrecorder-2022-06-30_15.33.41.zip

open with VLC

SteveMacenski commented 2 years ago

Nice video! But is there something in particular there you wanted to point out?

padhupradheep commented 2 years ago

Seg fault at the end of the video

SteveMacenski commented 2 years ago

I don't see a seg fault/terminal/prints, I only see an rviz capture. I can try to create a similar experiment to that to reproduce on my side though. Prints would be helpful to know where in that critic is crashing.

padhupradheep commented 2 years ago

Yes, yes! I’ll put it here soon 😃 just wanted to show you the scenario in which it is happening

SteveMacenski commented 2 years ago

Yeah I haven't been testing with situations requiring alot of turning in place at the end of trajectories, so that might be why I haven't seen in my testing

padhupradheep commented 2 years ago

https://github.com/artofnothingness/mppic/blob/2e7e99b55f7ac17e9adcdde75a420268673a5ad0/src/critics/path_align_critic.cpp#L47

This is the line we get the seg fault.

artofnothingness commented 2 years ago

try this https://github.com/artofnothingness/mppic/pull/78

padhupradheep commented 2 years ago

let me try it and get back.

padhupradheep commented 2 years ago

Quick update: I'm really not sure, for some reason there is a weird behavior with the planar_controller (gazebo omni plugin). During the point, when the robot tries to reach the goal, there is no seg-faults but the robot respawns itself to origin in Gazebo. I know this is a super strange issue.

I'll try it on the real robot and get back https://user-images.githubusercontent.com/20242192/177565627-3eb4323f-6d73-4963-92b7-9e1197ea284d.mp4 magic1-2022-07-06_15.40.21.zip

use vlc

padhupradheep commented 1 year ago

I cannot reproduce the issue on the real robot anymore. Good to close!