autowarefoundation / autoware.universe

https://autowarefoundation.github.io/autoware.universe/
Apache License 2.0
874 stars 564 forks source link

`controller_node_exe` makes an error and the vehicle stops forever #5701

Open Kim-mins opened 7 months ago

Kim-mins commented 7 months ago

Checklist

Description

Hi team, I'm currently running Autoware on Carla, and I found the controller_node_exe makes an error and the car stops forever. Here's an video on rviz: [rviz]

After some investigation, I could find the log below from [launch.log] and I guess the emergency stop makes the car stop.

1701161486.9303267 [component_container-66] [ERROR] [1701161486.930013582] [control.trajectory_follower.controller_node_exe]: [Emergency stop] vel: 0.003, acc: 0.051
1701161492.9390600 [component_container-66] [ERROR] [1701161492.938741224] [control.trajectory_follower.controller_node_exe]: [Emergency stop] vel: 0.000, acc: -5.000

I tried to debug it with gdb following the way similar to [here], but I Autoware did not run at all. Is there any proper debugging process to take for this issue?

Expected behavior

The car drives well and reaches to the goal.

Actual behavior

But it stops at the middle of the road, far from the goal.

Steps to reproduce

From docker image [ghcr.io/autowarefoundation/autoware-universe:humble-20230715-prebuilt-cuda-amd64], I ran Autoware and replayed the situation with the [rosbag file]

Versions

Possible causes

For now, I cannot localize the fault but it seems the controller_node_exe makes an error.

Additional context

No response

VRichardJP commented 7 months ago

I think the problem does not come from the controller node itself: something triggers an emergency and Autoware makes the vehicle stop.

A good place to start would be to check Autoware diagnostics with the RQT diagnostic aggregator plugin. I am pretty sure the root cause of the emergency will be reported there (such as: lidar frequency too low, X module has died)

Kim-mins commented 7 months ago

Thank you for the response @VRichardJP!

I followed your debugging suggestion(logging /diagnostics_agg with ros), and I could find the change of message below:

With the message, emergency occurred due to, I could find the code line regarding the message: https://github.com/autowarefoundation/autoware.universe/blob/ebba4f778b79eb6149b9e1b43894d2e855c030ba/control/pid_longitudinal_controller/src/pid_longitudinal_controller.cpp#L1057

According to the code, the variable msg is initialized with "emergency occurred due to " when the control state changed to the emergency state, and the cause of the error can be concatenated by the two following conditions. However, in my case, it seems the two cases are not met and no string is add to msg, and I cannot investigate more.

VRichardJP commented 7 months ago

Hmm.. it is difficult to judge like this. Is there any other diagnostic reporting an emergency?

Kim-mins commented 6 months ago

After receiving some messages like above, I could get the message below:

- level: "\x02"
  name: /autoware/control/autonomous_driving/performance_monitoring/control_state
  message: Error
  hardware_id: ''
  values:
  - key: 'controller_node_exe: control_state'
    value: emergency occurred due to translation deviation
- level: "\x02"
  name: '/autoware/control/autonomous_driving/performance_monitoring/control_state/controller_node_exe: control_state'
  message: emergency occurred due to translation deviation
  hardware_id: pid_longitudinal_controller
  values:
  - key: control_state
    value: '3'
  - key: translation deviation threshold
    value: '3.000000'
  - key: translation deviation
    value: '3.362426'
  - key: rotation deviation threshold
    value: '0.785400'
  - key: rotation deviation
    value: '0.002119'

and it seems the condition regarding translation deviation makes the emergency state.

Here's the related code block: https://github.com/autowarefoundation/autoware.universe/blob/ebba4f778b79eb6149b9e1b43894d2e855c030ba/control/pid_longitudinal_controller/src/pid_longitudinal_controller.cpp#L1061

At that time, according to the message above, the m_state_transition_params.emergency_state_traj_trans_dev was 3 and m_diagnostic_data.trans_deviation was 3.362426.

stale[bot] commented 4 months ago

This pull request has been automatically marked as stale because it has not had recent activity.

mehmetdogru commented 4 months ago

@Kim-mins Could you please provide the videos and log files again since they don't work for me. As well could you please check if the issue is reproducable with planning simulator?

@maxime-clem to assign to a proper person that can reproduce the issue using Carla Sim.

Kim-mins commented 4 months ago

Hi @mehmetdogru!

Sorry for the inconvenience. I should have noticed that the links are not working currently. I just restored every link of this issue.

Please let me know if you need more information during the debugging.

Thank you!

stale[bot] commented 2 months ago

This pull request has been automatically marked as stale because it has not had recent activity.

maxime-clem commented 2 weeks ago

@Kim-mins did you encounter this error again ? I suspect the issue has been solved on more recent versions of Autoware but it is difficult to confirm. If you encounter a crash of the controller node again, please post about it here. Otherwise let us close the issue if this is okay.