Open xmfcx opened 1 week ago
Ok I can reproduce it too:
sudo apt install stress
stress -c 24 -m 200
(I have 12c24t CPU and 64GB RAM, adjust to your system)
colcon test --event-handlers console_cohesion+ --mixin coverage-pytest --packages-select fault_injection
@xmfcx Thanks for your cooperate :+1: In the short term, extending evaluation time is a good idea. However, the root cause of the problem is the lack of resources in the testing environment. I have always struggled with that. Whether it is Flaky test or Flaky testing environment, our work is never done until we solve it.
@KeisukeShima the build-and-test jobs (not differential) are running on:
Either on:
leo-copper
machine with 4c8t CPU and 32GB RAMOn this failure instance, it was running on leo-copper
machine
And if we look into the colcon test command, we can see it is being called with sequential flag: https://github.com/autowarefoundation/autoware-github-actions/blob/a1960b03b1d3f5c4320ec1d4e7916ffac437f4f1/colcon-test/action.yaml#L89-L93
colcon test --event-handlers console_cohesion+ \ --mixin coverage-pytest \ --packages-above ${{ inputs.target-packages }} \ --executor sequential \ --return-code-on-test-failure
However, the root cause of the problem is the lack of resources in the testing environment.
What would be your suggestion for improvements? Do you think for example:
leo-copper
is not strong enough?We already run these sequentially and I think machines are strong enough. And by design of github-actions-runner, they can only perform one job at a time.
@xmfcx Thanks for the explanation. I would like to investigate as the situation seems to have changed from before.
Checklist
Description
https://github.com/autowarefoundation/autoware.universe/actions/runs/9743179438/job/26886066688#step:16:11812
It has been passing successfully for a long while but it failed randomly here.
Failure line: https://github.com/autowarefoundation/autoware.universe/blob/c1dbd5bf938fbdd9e91505cc5344020fbf4ea752/simulator/fault_injection/test/test_fault_injection_node.test.py#L175
History:![image](https://github.com/autowarefoundation/autoware.universe/assets/10751153/713229c5-2841-4171-8c2a-f6e9d2c0a6aa)
cc. @KeisukeShima @TomohitoAndo
Expected behavior
Test should have passed.
Actual behavior
But it failed.
Steps to reproduce
colcon test --event-handlers console_cohesion+ --mixin coverage-pytest --packages-select fault_injection
This command passed on my machine and also passed in CI many times.
Update: See this comment: https://github.com/autowarefoundation/autoware.universe/issues/7772#issuecomment-2200020894
But it failed in this one instance here.
Versions
No response
Possible causes
Maybe we can increase evaluation_time from
0.5s
to2s
to compensate for slow test environment.https://github.com/autowarefoundation/autoware.universe/blob/c1dbd5bf938fbdd9e91505cc5344020fbf4ea752/simulator/fault_injection/test/test_fault_injection_node.test.py#L70
Additional context
No response