DeepBlueRobotics / DeepBlueSim

MIT License
8 stars 0 forks source link

CI sometimes hangs #85

Closed brettle closed 2 months ago

brettle commented 3 months ago

See 8f0f4e9.

brettle commented 3 months ago

Draft PR #86 currently hangs on my local machine and might be useful for isolating the problem.

brettle commented 3 months ago

There appears to be a race condition when switching between :example:systemTest and :tests:systemTest in the same build run.

brettle commented 3 months ago

Further investigation reveals that the problem with PR #86 is that gradlerio is running :example:system Test in parallel with :tests:systemTest and one of them will fail because it can't access the TCP ports used by NT and halsim WebSockets server. I'm planning to use a file lock to ensure only once test process tries to run a set of tests at a time. Let me know if you have a better idea.

brettle commented 3 months ago

Just some notes:

The log for the macos CI run for 8f0f4e9 seems to indicate that the first test failed because Webots was not ready in time (it was in the process of reloading the world when the timeout occurred). The second test then hung after the world reload. Perhaps the first test failure left something in a bad state that caused the second test to hang. Worth trying to reproduce.

brettle commented 3 months ago

The log for Ubuntu indicates that the second test timed out because Webots didn't load the world. It doesn't look like Webots attempted to load it because there is no indication that the controller disconnected from NT. Not sure why. The NT server restarts between tests. Perhaps the controller hadn't reconnected in time to get the reload request.

brettle commented 3 months ago

This is not fully fixed. See this comment.

brettle commented 3 months ago

Perhaps restart the NT server every time we remind the user to load the world? That should workaround any race condition causing the server to miss messages indicating that the reload completed.

brettle commented 2 months ago

I believe these CI hangs are fixed in PR #98. The remaining CI hangs are covered by issue #113.