Closed kostyantyn closed 5 years ago
I run this test zillion times and didn't reproduce. My best guess is a clock glitch happened during that failed try, but it is extremely unlikely. Anyway I fixed possible glitch in #942 and extended test coverage.
This test usually runs fine when executed with test_unite
, but fails for me when run under contrib/devtools/run-unit-tests.sh
.
Makes sense somehow, run unit tests runs with crazy parallelism ;-)
Closing the issue as it seems to be fixed but very difficult to reproduce. According to @cmihai, who is facing with this occasionally, it doesn't happen anymore: https://github.com/dtr-org/unit-e/pull/942#pullrequestreview-225584586.
Happened again.
Failure: waiter_tests from test/waiter_tests.cpp
Running 1 test case...
Test cases order is shuffled using seed: 1108117134
Entering test module "Unit-e Test Suite"
test/waiter_tests.cpp(14): Entering test suite "waiter_tests"
test/waiter_tests.cpp(29): Entering test case "wait_and_wake_test"
test/waiter_tests.cpp(22): fatal error: in "waiter_tests/wait_and_wake_test": critical check now < wait_until has failed
terminate called without an active exception
unknown location(0): fatal error: in "waiter_tests/wait_and_wake_test": signal: SIGABRT (application abort requested)
test/waiter_tests.cpp(22): last checkpoint
test/waiter_tests.cpp(29): Leaving test case "wait_and_wake_test"; testing time: 20075710us
test/waiter_tests.cpp(14): Leaving test suite "waiter_tests"; testing time: 20075723us
According to https://en.cppreference.com/w/cpp/thread/condition_variable/notify_all :
The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock.
This is the case in our code: https://github.com/dtr-org/unit-e/blob/ca68df2878645c9f968d2520a5459e5718e48721/src/proposer/waiter.cpp#L20 . Could this be the root of the issue?
It happened again to me:
Failure: waiter_tests from test/waiter_tests.cpp
Running 1 test case...
Entering test module "Unit-e Test Suite"
test/waiter_tests.cpp(14): Entering test suite "waiter_tests"
test/waiter_tests.cpp(29): Entering test case "wait_and_wake_test"
test/waiter_tests.cpp(22): fatal error: in "waiter_tests/wait_and_wake_test": critical check now < wait_until has failed
terminate called without an active exception
unknown location(0): fatal error: in "waiter_tests/wait_and_wake_test": signal: SIGABRT (application abort requested)
test/waiter_tests.cpp(22): last checkpoint
test/waiter_tests.cpp(29): Leaving test case "wait_and_wake_test"; testing time: 28876094us
test/waiter_tests.cpp(14): Leaving test suite "waiter_tests"; testing time: 28876108us
Leaving test module "Unit-e Test Suite"; testing time: 28876198us
*** 2 failures are detected in the test module "Unit-e Test Suite"
make[3]: *** [Makefile:11901: test/waiter_tests.cpp.test] Error 1
Fixed by erasing :)
This test seems to fail randomly. Here is the log: waiter_test.log