Open tli2 opened 6 years ago
Here's an example of how it fails, and on rerun it typically passes. Maybe some nondeterminism in how stuff is being scheduled:
Start 46: serializable_transaction_test
46: Test command: /Users/travis/build/cmu-db/peloton/build/test/serializable_transaction_test "--gtest_color=yes" "--gtest_output=xml:/Users/travis/build/cmu-db/peloton/build/test/serializable_transaction_test.xml"
46: Environment variables:
46: LSAN_OPTIONS=suppressions=/Users/travis/build/cmu-db/peloton/test/leak_suppr.txt
46: Test timeout computed to be: 9.99988e+06
46: Running main() from gmock_main.cc
46: [0;32m[==========] [mRunning 6 tests from 1 test case.
46: [0;32m[----------] [mGlobal test environment set-up.
46: [0;32m[----------] [m6 tests from SerializableTransactionTests
46: [0;32m[ RUN ] [mSerializableTransactionTests.TransactionTest
46: [0;32m[ OK ] [mSerializableTransactionTests.TransactionTest (36 ms)
46: [0;32m[ RUN ] [mSerializableTransactionTests.ReadOnlyTransactionTest
46: 2018-06-25 20:34:00 [test/concurrency/testing_transaction_util.cpp:192:CreateTable] INFO - the database_id is 16777216
46: 2018-06-25 20:34:00 [test/concurrency/serializable_transaction_test.cpp:330:ExecuteNext] INFO - Txn 0 commits: Success
46: /Users/travis/build/cmu-db/peloton/test/concurrency/serializable_transaction_test.cpp:90: Failure
46: Value of: scheduler.schedules[0].results.size()
46: Actual: 0
46: Expected: 10
46: [0;31m[ FAILED ] [mSerializableTransactionTests.ReadOnlyTransactionTest (366 ms)
Also worth noting, on both occasions observed, this happens on mac builds only
I’ll spend some time on it today.
Haven't been able to reproduce it locally all day :(
I just tried running it 10-20 times and got it to fail:
[build] ASAN_OPTIONS=detect_container_overflow=0 ./test/serializable_transaction_test
Running main() from gmock_main.cc
[==========] Running 6 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 6 tests from SerializableTransactionTests
[ RUN ] SerializableTransactionTests.TransactionTest
[ OK ] SerializableTransactionTests.TransactionTest (28 ms)
[ RUN ] SerializableTransactionTests.ReadOnlyTransactionTest
2018-06-26 13:46:19 [test/concurrency/testing_transaction_util.cpp:192:CreateTable] INFO - the database_id is 16777216
2018-06-26 13:46:19 [test/concurrency/serializable_transaction_test.cpp:314:ExecuteNext] INFO - Txn 0 commits: Success
/Users/wanshenl/Documents/pavlo/peloton/test/concurrency/serializable_transaction_test.cpp:90: Failure
Value of: scheduler.schedules[0].results.size()
Actual: 0
Expected: 10
[ FAILED ] SerializableTransactionTests.ReadOnlyTransactionTest (217 ms)
Same failure. Didn't have to rebuild between runs.
Can you get a TRACE level log of it to get a better idea what's happening? I still can't get it to reproduce.
@lmwnshn helpfully ran it 1000 times on this commit just before I did all the TransactionContext refactoring in the last couple of weeks. It still reproduced. My suspicion right now is that in rare cases the bootstrapping process (which seems slower in the last month) takes longer than we wait in the test (1 epoch) and it triggers an abort on the scan. We have 2 experiments to try today:
I should add that we've also seen it on the Ubuntu boxes, so this is no longer a Mac-exclusive issue.
The test we occasionally fail was modified in #1322. I'll have to spend more time looking at it tomorrow to understand why we seem to have a race.
Several complaints about this failing from time to time on Travis.
Might be a Mac issue, might be that the test is broken, might be that we broke something in CC.Investigation would be appreciated.