dtr-org / unit-e

A digital currency for a new era of decentralized trust
https://unit-e.io
MIT License
45 stars 15 forks source link

feature_fork_choice_forked_finalize_epoch is floating #978

Closed thothd closed 5 years ago

thothd commented 5 years ago

Travis log feature_fork_choice_forked_finalize_epoch.txt

frolosofsky commented 5 years ago

Reproduced locally. Finalizer2 has crashed on assertion. There's a stacktrace:

4   unit-e                              0x00000001042a6a89 esperanza::WalletExtension::VoteIfNeeded(esperanza::FinalizationState const&, unsigned int) + 3097 (walletextension.cpp:596)
5   unit-e                              0x00000001042ab212 esperanza::WalletExtension::BlockConnected(std::__1::shared_ptr<CBlock const> const&, CBlockIndex const&) + 3826 (walletextension.cpp:847)
6   unit-e                              0x000000010447f9fe CWallet::BlockConnected(std::__1::shared_ptr<CBlock const> const&, CBlockIndex const*, std::__1::vector<std::__1::shared_ptr<CTransaction const>, std::__1::allocator<std::__1::shared_ptr<CTransaction const> > > const&) + 606 (wallet.cpp:1379)
frolosofsky commented 5 years ago

The finalizer crashed right after entering in voting epoch after block 25 being connected (that means finalizer doesn't finish full sync yet which must be block 45 but I'm not it's an issue). Last records:

2019-04-15 15:41:43.787596 [            ] UpdateTip: new best=64e467ec86e053e43fb38ac8c874a50945ecb4ed06d869b3fad83ec33b10960b height=25 version=
0x20000000 log2_work=5.7004397 tx=27 date='2019-04-15 15:41:31' progress=1.000000 cache=0.0MiB(3txo)
2019-04-15 15:41:43.795462 [       bench]   - Connect postprocess: 23.66ms [0.46s (18.23ms/blk)]
2019-04-15 15:41:43.802034 [       bench] - Connect block: 105.73ms [1.91s (76.32ms/blk)]
2019-04-15 15:41:43.810358 [     mempool] Checking mempool with 0 transactions and 0 inputs
2019-04-15 15:41:43.818850 [         net] Commits full sync asked for 1 blocks to download
2019-04-15 15:41:43.824808 [         net] Requesting block 6896837e388bd8fcabe49d06f4d8458fd2241258a81faf424c71af5357f917b6 (41) peer=1
2019-04-15 15:41:43.825446 [         net] sending getdata (37 bytes) peer=1
2019-04-15 15:41:43.825784 [         net] received: block (444 bytes) peer=1
2019-04-15 15:41:43.826121 [         net] received block 5febb9b833a98ccce77c5aa8272353ba5b4d3d6f6e05638d48519b3b531e0417 peer=1
2019-04-15 15:41:43.827265 [       bench]   - Load block from disk: 0.01ms [0.00s]
2019-04-15 15:41:43.827707 [  validation] Kernel hash does not meet target coin=Coin(txid=6641ead88ab0ffbfbcc5ef95550f1d1423d6f15898d9faed175bff1daa70ec4f index=1 amount=100000000000 height=16) kernel=9d9bf53043c07a1911624bc3cc3eda65a03610f89574e6265628fbb0ea03d890 target=545259519
2019-04-15 15:41:43.827849 [  validation] Letting artificial block generation succeed nevertheless (mine_blocks_on_demand=true)
2019-04-15 15:41:43.828008 [       bench]     - Sanity checks: 0.56ms [0.10s (3.73ms/blk)]
2019-04-15 15:41:43.828121 [       bench]     - Fork checks: 0.12ms [0.08s (2.90ms/blk)]
2019-04-15 15:41:43.828588 [       bench]       - Connect 1 transactions: 0.46ms (0.458ms/tx, 0.458ms/txin) [0.06s (2.37ms/blk)]
2019-04-15 15:41:43.828833 [       bench]     - Verify 1 txins: 0.70ms (0.696ms/txin) [0.15s (5.70ms/blk)]
2019-04-15 15:41:43.829164 [       bench]     - Index writing: 0.34ms [0.15s (5.95ms/blk)]
2019-04-15 15:41:43.829303 [       bench]     - Callbacks: 0.14ms [0.07s (2.60ms/blk)]
2019-04-15 15:41:43.829450 [       bench]   - Connect total: 2.19ms [0.68s (26.31ms/blk)]
2019-04-15 15:41:43.829581 [       bench]   - Flush: 0.13ms [0.06s (2.41ms/blk)]
2019-04-15 15:41:43.829698 [finalization] Process tip block_hash=5febb9b833a98ccce77c5aa8272353ba5b4d3d6f6e05638d48519b3b531e0417 height=26
2019-04-15 15:41:43.829811 [finalization] State for block_hash=5febb9b833a98ccce77c5aa8272353ba5b4d3d6f6e05638d48519b3b531e0417 heigh=26 has been processed from commits, confirming...
2019-04-15 15:41:43.830022 [finalization] IncrementDynasty: New current dynasty=3
2019-04-15 15:41:43.830131 [finalization] InitializeEpoch: new_epoch=6 starts at height=26
2019-04-15 15:41:43.831939 [finalization] InitializeEpoch: new_epoch=6 current_dynasty=3 last_justified_epoch=4 last_finalized_epoch=3
2019-04-15 15:41:43.832092 [finalization] State for block_hash=5febb9b833a98ccce77c5aa8272353ba5b4d3d6f6e05638d48519b3b531e0417 height=26 confirmed
2019-04-15 15:41:43.832236 [       bench]   - Writing chainstate: 2.66ms [0.71s (27.32ms/blk)]
2019-04-15 15:41:43.833860 [ estimatefee] Blockpolicy estimates updated by 0 of 0 block txs, since last block 0 of 0 tracked, mempool map size 0, max target 12 from current
2019-04-15 15:41:43.834167 [            ] UpdateTip: new best=5febb9b833a98ccce77c5aa8272353ba5b4d3d6f6e05638d48519b3b531e0417 height=26 version=0x20000000 log2_work=5.7548875 tx=28 date='2019-04-15 15:41:32' progress=1.000000 cache=0.0MiB(6txo)
2019-04-15 15:41:43.834281 [       bench]   - Connect postprocess: 2.04ms [0.46s (17.60ms/blk)]
2019-04-15 15:41:43.834366 [       bench] - Connect block: 7.03ms [1.92s (73.65ms/blk)]
2019-04-15 15:41:43.834475 [     mempool] Checking mempool with 0 transactions and 0 inputs
2019-04-15 15:41:43.834978 [         net] received: block (810 bytes) peer=1
2019-04-15 15:41:43.835174 [         net] received block 37b336eaa481a70507992728a14cd5cf2200e76d8a8e321bbab0c77da1ce1097 peer=1
2019-04-15 15:41:43.836129 [       bench]   - Load block from disk: 0.01ms [0.00s]
2019-04-15 15:41:43.836375 [       bench]     - Sanity checks: 0.13ms [0.10s (3.60ms/blk)]
2019-04-15 15:41:43.836463 [       bench]     - Fork checks: 0.09ms [0.08s (2.80ms/blk)]
2019-04-15 15:41:43.836557 [finalization] Checking VOTE with id 74de3b913e7c9043cc0d075064284301c468dd875a7911bfee68315e888349d1
2019-04-15 15:41:43.836915 [finalization] ValidateVote: valid. validator=d955275145d49d6efa80c46c28e36f19505830ca target=64e467ec86e053e43fb38ac8c874a50945ecb4ed06d869b3fad83ec33b10960b source_epoch=4 target_epoch=5
2019-04-15 15:41:43.837226 [finalization] ValidateVote: valid. validator=d955275145d49d6efa80c46c28e36f19505830ca target=64e467ec86e053e43fb38ac8c874a50945ecb4ed06d869b3fad83ec33b10960b source_epoch=4 target_epoch=5
2019-04-15 15:41:43.837980 [       bench]       - Connect 2 transactions: 1.51ms (0.754ms/tx, 0.754ms/txin) [0.06s (2.34ms/blk)]
2019-04-15 15:41:43.838198 [       bench]     - Verify 2 txins: 1.73ms (0.863ms/txin) [0.15s (5.55ms/blk)]
2019-04-15 15:41:43.838463 [       bench]     - Index writing: 0.27ms [0.15s (5.74ms/blk)]
2019-04-15 15:41:43.838545 [       bench]     - Callbacks: 0.09ms [0.07s (2.51ms/blk)]
2019-04-15 15:41:43.838670 [       bench]   - Connect total: 2.54ms [0.69s (25.43ms/blk)]
2019-04-15 15:41:43.838767 [       bench]   - Flush: 0.10ms [0.06s (2.32ms/blk)]
2019-04-15 15:41:43.838849 [finalization] Process tip block_hash=37b336eaa481a70507992728a14cd5cf2200e76d8a8e321bbab0c77da1ce1097 height=27
2019-04-15 15:41:43.838917 [finalization] State for block_hash=37b336eaa481a70507992728a14cd5cf2200e76d8a8e321bbab0c77da1ce1097 heigh=27 has been processed from commits, confirming...
2019-04-15 15:41:43.839074 [finalization] ProcessVote: validator=d955275145d49d6efa80c46c28e36f19505830ca voted successfully. target=64e467ec86e053e43fb38ac8c874a50945ecb4ed06d869b3fad83ec33b10960b source_epoch=4 target_epoch=5.
2019-04-15 15:41:43.839163 [finalization] ProcessVote: epoch=5 justified.
2019-04-15 15:41:43.839219 [finalization] ProcessVote: epoch=4 finalized.
2019-04-15 15:41:43.839270 [finalization] ProcessVote: vote from validator=d955275145d49d6efa80c46c28e36f19505830ca processed.
2019-04-15 15:41:43.839404 [finalization] State for block_hash=37b336eaa481a70507992728a14cd5cf2200e76d8a8e321bbab0c77da1ce1097 height=27 confirmed
2019-04-15 15:41:43.839506 [finalization] Trimming state repository for height < 25
2019-04-15 15:41:43.839713 [       bench]   - Writing chainstate: 0.95ms [0.71s (26.34ms/blk)]
2019-04-15 15:41:43.839785 [    snapshot] Saved snapshot index
2019-04-15 15:41:43.841135 [ estimatefee] Blockpolicy estimates updated by 0 of 0 block txs, since last block 0 of 0 tracked, mempool map size 0, max target 12 from current
2019-04-15 15:41:43.841452 [            ] UpdateTip: new best=37b336eaa481a70507992728a14cd5cf2200e76d8a8e321bbab0c77da1ce1097 height=27 version=0x20000000 log2_work=5.8073549 tx=30 date='2019-04-15 15:41:36' progress=1.000000 cache=0.0MiB(11txo)
2019-04-15 15:41:43.841540 [       bench]   - Connect postprocess: 1.82ms [0.46s (17.02ms/blk)]
2019-04-15 15:41:43.841608 [       bench] - Connect block: 5.42ms [1.92s (71.13ms/blk)]
2019-04-15 15:41:43.841735 [     mempool] Checking mempool with 0 transactions and 0 inputs
2019-04-15 15:41:43.862194 [finalization] finalizer=d955275145d49d6efa80c46c28e36f19505830ca phase changed to IS_VALIDATING because start_dynasty=3 began
2019-04-15 15:41:43.863314 [finalization] VoteIfNeeded: Validator voting for epoch 5 and dynasty 3.
2019-04-15 15:41:43.863517 [finalization] GetRecommendedVote: source_epoch=4 target_epoch=5 dynasty=3 target_hash=64e467ec86e053e43fb38ac8c874a50945ecb4ed06d869b3fad83ec33b10960b.
frolosofsky commented 5 years ago

Yet another issue I've found by running this test in a loop:

************************
EXCEPTION: NSt3__112system_errorE
recursive_mutex lock failed: Invalid argument
unite in scheduler

libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argument

It happened when self.nodes[0] tried to stop after successfully finishing first test suite. log.txt

Staketrace is not useful at a glance.

Thread 1 Crashed:: unite-scheduler
0   libsystem_kernel.dylib              0x00007fff5bf2f2c6 __pthread_kill + 10
1   libsystem_pthread.dylib             0x00007fff5bfe4bf1 pthread_kill + 284
2   libsystem_c.dylib                   0x00007fff5be996a6 abort + 127
3   libc++abi.dylib                     0x00007fff58e52641 abort_message + 231
4   libc++abi.dylib                     0x00007fff58e527c7 default_terminate_handler() + 243
5   libobjc.A.dylib                     0x00007fff5a5c8eeb _objc_terminate() + 105
6   libc++abi.dylib                     0x00007fff58e5e19e std::__terminate(void (*)()) + 8
7   libc++abi.dylib                     0x00007fff58e51c7e __cxa_rethrow + 99
8   unit-e                              0x000000010b5b664a void TraceThread<std::__1::function<void ()> >(char const*, std::__1::function<void ()>) + 1642 (util.h:360)
9   unit-e                              0x000000010b6157b2 void boost::_bi::list2<boost::_bi::value<char const*>, boost::_bi::value<std::__1::function<void ()> > >::operator()<void (*)(char const*, std::__1::function<void ()>), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(char const*, std::__1::function<void ()>), boost::_bi::list0&, int) + 162 (bind.hpp:319)
10  unit-e                              0x000000010b6156e5 boost::_bi::bind_t<void, void (*)(char const*, std::__1::function<void ()>), boost::_bi::list2<boost::_bi::value<char const*>, boost::_bi::value<std::__1::function<void ()> > > >::operator()() + 69 (bind.hpp:1294)
11  unit-e                              0x000000010b61527c boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(char const*, std::__1::function<void ()>), boost::_bi::list2<boost::_bi::value<char const*>, boost::_bi::value<std::__1::function<void ()> > > > >::run() + 44 (thread.hpp:117)
12  libboost_thread-mt.dylib            0x000000010eaa1268 boost::(anonymous namespace)::thread_proxy(void*) + 120
13  libsystem_pthread.dylib             0x00007fff5bfe22eb _pthread_body + 126
14  libsystem_pthread.dylib             0x00007fff5bfe5249 _pthread_start + 66
15  libsystem_pthread.dylib             0x00007fff5bfe140d thread_start + 13
frolosofsky commented 5 years ago

It's not "my" issue but I was interesting what is going on, and I think I finished investigation.

The issue happens because test creates two finalizers which act like being one. Once finalizer2 processes block which includes vote from finalizer1 and at the same moment tries to vote itself, it cannot find corresponding prev transaction in its wallet. This started to happen once #937 is merged, that PR makes validator, when voting, using global validator state instead of local one, and because of two finalizer are actually the same, it made a glitch between two finalizers which started sharing the same validator state.

I think it's not an issue for production code, we can simply rewrite test to use one validator, but make votes on different branches via rpc, like we double vote in #906.