I have been testing backups and replays a lot these past weeks. From API (full) nodes i take a daily backup of the blocks directory and offer them out via https://eosnode.tools/blocks
What has been interesting with starting this community site, is that i have had a lot of feedback from other BPs/devs around issues they are having with recovering nodes.
By far the most popular issue i have been informed of around recovery (and i finally experienced this myself today) is with the following during a --hard-replay --wasm-runtime wavm resync from a blocks backup:
2018-08-13T21:56:12.610 thread-0 wasm_interface.cpp:929 eosio_assert ] message: stopped.
2018-08-13T21:56:12.667 thread-0 controller.cpp:889 apply_block ] e.to_detail_string(): 3030000 block_validate_exception: Block exception
receipt does not match
{"producer_receipt":{"status":"hard_fail","cpu_usage_us":0,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]},"validator_receipt":{"status":"executed","cpu_usage_us":37355,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]}}
thread-0 controller.cpp:876 apply_block
2018-08-13T21:56:12.669 thread-0 controller.cpp:913 push_block ] 3030000 block_validate_exception: Block exception
receipt does not match
{"producer_receipt":{"status":"hard_fail","cpu_usage_us":0,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]},"validator_receipt":{"status":"executed","cpu_usage_us":37355,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]}}
thread-0 controller.cpp:876 apply_block
{}
thread-0 controller.cpp:893 apply_block
2018-08-13T21:57:26.185 thread-0 main.cpp:125 main ] 3030000 block_validate_exception: Block exception
receipt does not match
{"producer_receipt":{"status":"hard_fail","cpu_usage_us":0,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]},"validator_receipt":{"status":"executed","cpu_usage_us":37355,"net_usage_words":0,"trx":[0,"47d8833083322f53544aff08bc984abd85bee737e34c5376a8b2016fa595167a"]}}
thread-0 controller.cpp:876 apply_block
{}
thread-0 controller.cpp:893 apply_block
rethrow
{}
thread-0 controller.cpp:913 push_block
{}
thread-0 chain_plugin.cpp:603 plugin_startup
Encounter this in the replay results in the process hard failing, the nodeos process is killed and it is not possible to restart from this point.
This occurs at around the 9m block mark. You can test this for yourself by downloading the backup to an Ubuntu 16.04 box with:
wget https://s3-eu-west-1.amazonaws.com/block-matrix-backup-bucket-068704141744/public-blocks-backup/blocks_2018-08-10-08-32.tar.gz -O blocks_backup.tar.gz
tar xvzf blocks_backup.tar.gz
And using the same replay command above.
Is there any way around this? It is quite concerning, because i am taking these backups from a healthy, working API node, but it now seems that the blocks backup is useless, and therefore recovery has to be taken from a much older backup which increases the recovery time.
I have tried deleting reversible and just attempting the resync from the blocks.log with no difference in the end result.
I have been testing backups and replays a lot these past weeks. From API (full) nodes i take a daily backup of the
blocks
directory and offer them out via https://eosnode.tools/blocksWhat has been interesting with starting this community site, is that i have had a lot of feedback from other BPs/devs around issues they are having with recovering nodes.
By far the most popular issue i have been informed of around recovery (and i finally experienced this myself today) is with the following during a
--hard-replay --wasm-runtime wavm
resync from ablocks
backup:Encounter this in the replay results in the process hard failing, the
nodeos
process is killed and it is not possible to restart from this point.This occurs at around the 9m block mark. You can test this for yourself by downloading the backup to an Ubuntu 16.04 box with:
And using the same replay command above.
Is there any way around this? It is quite concerning, because i am taking these backups from a healthy, working API node, but it now seems that the
blocks
backup is useless, and therefore recovery has to be taken from a much older backup which increases the recovery time.I have tried deleting
reversible
and just attempting the resync from theblocks.log
with no difference in the end result.