autowarefoundation / autoware.universe

https://autowarefoundation.github.io/autoware.universe/
Apache License 2.0
893 stars 582 forks source link

vehicle_cmd_gate has flaky test #6612

Closed xmfcx closed 3 months ago

xmfcx commented 4 months ago

Checklist

Description

Expected: (std::abs(lon_jerk)) < (max_lon_jerk_lim * threshold_scale), actual: 1.55397 vs 1.54

This test failed. But non-cuda passed for the same instance.

It passed for many other tests too, this is one instance I've found that has failed.

The related code is here:

https://github.com/autowarefoundation/autoware.universe/blob/59c504b2205a5a6a19c4bdef338557d518b09918/control/vehicle_cmd_gate/test/src/test_filter_in_vehicle_cmd_gate_node.cpp#L261

Expected behavior

Test always passes.

Actual behavior

Test failed this time.

Steps to reproduce

Versions

Possible causes

No response

Additional context

No response

xmfcx commented 4 months ago

@TakaHoribe -san, you seem to be the test author.

The test uses a dt variable, so this is a time dependent test. I didn't analyze it very deeply maybe it fails on very low end hardware very occasionally? I don't know.

cc. @takayuki5168

xmfcx commented 4 months ago

https://github.com/autowarefoundation/autoware.universe/blob/59c504b2205a5a6a19c4bdef338557d518b09918/control/vehicle_cmd_gate/test/src/test_filter_in_vehicle_cmd_gate_node.cpp#L390-L400

Looking into it more, here there are 2 i variables where one shadows the other. This alone shouldn't be the source.

I'm assuming this loop was expected to run every 10ms. But if spin_some takes more time, this can fluctuate.

It might be a good idea to replace it with something like:

for (size_t i = 0; i < 100; ++i) {
  auto start_time = std::chrono::steady_clock::now();

  const bool reset_clock = (i == 0);
  const auto cmd = cmd_generator_.calcSinWaveCommand(reset_clock);
  pub_sub_node_.publishControlCommand(cmd);
  pub_sub_node_.publishDefaultTopicsNoSpin();
  for (int j = 0; j < 20; ++j) {
    rclcpp::spin_some(pub_sub_node_.get_node_base_interface());
    rclcpp::spin_some(vehicle_cmd_gate_node_->get_node_base_interface());
  }

  auto end_time = std::chrono::steady_clock::now();
  std::chrono::milliseconds elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time);

  std::chrono::milliseconds sleep_duration = std::chrono::milliseconds{10} - elapsed;
  if (sleep_duration.count() > 0) {
    std::this_thread::sleep_for(sleep_duration);
  }
}
xmfcx commented 4 months ago

Or we could just increase the threshold_scale from 1.1 to 1.2 😅

https://github.com/autowarefoundation/autoware.universe/blob/59c504b2205a5a6a19c4bdef338557d518b09918/control/vehicle_cmd_gate/test/src/test_filter_in_vehicle_cmd_gate_node.cpp#L258

xmfcx commented 4 months ago

Happened again:

takayuki5168 commented 4 months ago

@xmfcx Thank you for the analysis and suggestion. I agree with them. I created a PR for the change, so let's see what will happen with this PR. https://github.com/autowarefoundation/autoware.universe/pull/6661

xmfcx commented 3 months ago

Fixed with:

Things have been green for a while, thanks @takayuki5168 -san!

Screenshot from 2024-04-01 16-05-10