EOSIO / eos

An open source smart contract platform
https://developers.eos.io/manuals/eos
MIT License
11.28k stars 3.6k forks source link

Hard Replay Issue with blocks backup #5205

Closed pete001 closed 6 years ago

pete001 commented 6 years ago

I have been testing backups and replays a lot these past weeks. From API (full) nodes i take a daily backup of the blocks directory and offer them out via https://eosnode.tools/blocks

What has been interesting with starting this community site, is that i have had a lot of feedback from other BPs/devs around issues they are having with recovering nodes.

By far the most popular issue i have been informed of around recovery (and i finally experienced this myself today) is with the following during a --hard-replay --wasm-runtime wavm resync from a blocks backup:

2018-08-13T21:56:12.610 thread-0   wasm_interface.cpp:929        eosio_assert         ] message: stopped.
2018-08-13T21:56:12.667 thread-0   controller.cpp:889            apply_block          ] e.to_detail_string(): 3030000 block_validate_exception: Block exception
receipt does not match
    {"producer_receipt":{"status":"hard_fail","cpu_usage_us":0,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]},"validator_receipt":{"status":"executed","cpu_usage_us":37355,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]}}
    thread-0  controller.cpp:876 apply_block
2018-08-13T21:56:12.669 thread-0   controller.cpp:913            push_block           ] 3030000 block_validate_exception: Block exception
receipt does not match
    {"producer_receipt":{"status":"hard_fail","cpu_usage_us":0,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]},"validator_receipt":{"status":"executed","cpu_usage_us":37355,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]}}
    thread-0  controller.cpp:876 apply_block

    {}
    thread-0  controller.cpp:893 apply_block
2018-08-13T21:57:26.185 thread-0   main.cpp:125                  main                 ] 3030000 block_validate_exception: Block exception
receipt does not match
    {"producer_receipt":{"status":"hard_fail","cpu_usage_us":0,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]},"validator_receipt":{"status":"executed","cpu_usage_us":37355,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]}}
    thread-0  controller.cpp:876 apply_block

    {}
    thread-0  controller.cpp:893 apply_block
rethrow
    {}
    thread-0  controller.cpp:913 push_block

    {}
    thread-0  chain_plugin.cpp:603 plugin_startup

Encounter this in the replay results in the process hard failing, the nodeos process is killed and it is not possible to restart from this point.

This occurs at around the 9m block mark. You can test this for yourself by downloading the backup to an Ubuntu 16.04 box with:

wget https://s3-eu-west-1.amazonaws.com/block-matrix-backup-bucket-068704141744/public-blocks-backup/blocks_2018-08-10-08-32.tar.gz -O blocks_backup.tar.gz
tar xvzf blocks_backup.tar.gz

And using the same replay command above.

Is there any way around this? It is quite concerning, because i am taking these backups from a healthy, working API node, but it now seems that the blocks backup is useless, and therefore recovery has to be taken from a much older backup which increases the recovery time.

I have tried deleting reversible and just attempting the resync from the blocks.log with no difference in the end result.

pete001 commented 6 years ago

This test machine was running 1.1.1, when i updated to 1.1.4 the same blocks dir resulted in no issues.