Closed brettle closed 2 months ago
Draft PR #86 currently hangs on my local machine and might be useful for isolating the problem.
There appears to be a race condition when switching between :example:systemTest
and :tests:systemTest
in the same build run.
Further investigation reveals that the problem with PR #86 is that gradlerio is running :example:system Test
in parallel with :tests:systemTest
and one of them will fail because it can't access the TCP ports used by NT and halsim WebSockets server. I'm planning to use a file lock to ensure only once test process tries to run a set of tests at a time. Let me know if you have a better idea.
Just some notes:
The log for the macos CI run for 8f0f4e9 seems to indicate that the first test failed because Webots was not ready in time (it was in the process of reloading the world when the timeout occurred). The second test then hung after the world reload. Perhaps the first test failure left something in a bad state that caused the second test to hang. Worth trying to reproduce.
The log for Ubuntu indicates that the second test timed out because Webots didn't load the world. It doesn't look like Webots attempted to load it because there is no indication that the controller disconnected from NT. Not sure why. The NT server restarts between tests. Perhaps the controller hadn't reconnected in time to get the reload request.
This is not fully fixed. See this comment.
Perhaps restart the NT server every time we remind the user to load the world? That should workaround any race condition causing the server to miss messages indicating that the reload completed.
I believe these CI hangs are fixed in PR #98. The remaining CI hangs are covered by issue #113.
See 8f0f4e9.