AntelopeIO / leap

C++ implementation of the Antelope protocol
Other
116 stars 68 forks source link

Incorrect production scheduling after error: "Not producing block because "defproducerd" signed a block at the next block time or later" #2380

Closed greg7mdp closed 3 months ago

greg7mdp commented 7 months ago

Sometimes, the test trx_finality_status_forked_if_test, and likely others, show in the log of node_03 a volley of messages:

Waiting till another block is received and scheduling Speculative/Production Change

These are emitted in producer_plugin_impl::schedule_production_loop(), and are likely caused by incorrect parameters to, or calculations in block_timing_util::calculate_producer_wake_up_time().

We should investigate whether there is indeed a miscalculation here (maybe an off-by-one issue) and address it.

Below is a larger section of the log showing the repeated error messages.

error 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:1853      start_block          ] Not producing block because "defproducerd" signed a block at the next block time or later (2024-03-13T19:23:57.500) than the pending block time (2024-03-13T19:23:57.500)
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2515      schedule_production_ ] Waiting till another block is received and scheduling Speculative/Production Change
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2589      schedule_delayed_pro ] Scheduling Speculative/Production Change at 2024-03-13T19:23:57.200
error 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:1853      start_block          ] Not producing block because "defproducerd" signed a block at the next block time or later (2024-03-13T19:23:57.500) than the pending block time (2024-03-13T19:23:57.500)
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2515      schedule_production_ ] Waiting till another block is received and scheduling Speculative/Production Change
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2589      schedule_delayed_pro ] Scheduling Speculative/Production Change at 2024-03-13T19:23:57.200
error 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:1853      start_block          ] Not producing block because "defproducerd" signed a block at the next block time or later (2024-03-13T19:23:57.500) than the pending block time (2024-03-13T19:23:57.500)
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2515      schedule_production_ ] Waiting till another block is received and scheduling Speculative/Production Change
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2589      schedule_delayed_pro ] Scheduling Speculative/Production Change at 2024-03-13T19:23:57.200
error 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:1853      start_block          ] Not producing block because "defproducerd" signed a block at the next block time or later (2024-03-13T19:23:57.500) than the pending block time (2024-03-13T19:23:57.500)
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2515      schedule_production_ ] Waiting till another block is received and scheduling Speculative/Production Change
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2589      schedule_delayed_pro ] Scheduling Speculative/Production Change at 2024-03-13T19:23:57.200
error 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:1853      start_block          ] Not producing block because "defproducerd" signed a block at the next block time or later (2024-03-13T19:23:57.500) than the pending block time (2024-03-13T19:23:57.500)
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2515      schedule_production_ ] Waiting till another block is received and scheduling Speculative/Production Change
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2589      schedule_delayed_pro ] Scheduling Speculative/Production Change at 2024-03-13T19:23:57.200
error 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:1853      start_block          ] Not producing block because "defproducerd" signed a block at the next block time or later (2024-03-13T19:23:57.500) than the pending block time (2024-03-13T19:23:57.500)
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2515      schedule_production_ ] Waiting till another block is received and scheduling Speculative/Production Change
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2589      schedule_delayed_pro ] Scheduling Speculative/Production Change at 2024-03-13T19:23:57.200
error 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:1853      start_block          ] Not producing block because "defproducerd" signed a block at the next block time or later (2024-03-13T19:23:57.500) than the pending block time (2024-03-13T19:23:57.500)
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2515      schedule_production_ ] Waiting till another block is received and scheduling Speculative/Production Change
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2589      schedule_delayed_pro ] Scheduling Speculative/Production Change at 2024-03-13T19:23:57.200
error 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:1853      start_block          ] Not producing block because "defproducerd" signed a block at the next block time or later (2024-03-13T19:23:57.500) than the pending block time (2024-03-13T19:23:57.500)
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2515      schedule_production_ ] Waiting till another block is received and scheduling Speculative/Production Change
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2589      schedule_delayed_pro ] Scheduling Speculative/Production Change at 2024-03-13T19:23:57.200
error 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:1853      start_block          ] Not producing block because "defproducerd" signed a block at the next block time or later (2024-03-13T19:23:57.500) than the pending block time (2024-03-13T19:23:57.500)
debug 2024-03-13T19:23:57.378 nodeos    producer_plugin.cpp:2515      schedule_production_ ] Waiting till another block is received and scheduling Speculative/Production Change
debug 2024-03-13T19:23:57.379 nodeos    producer_plugin.cpp:2589      schedule_delayed_pro ] Scheduling Speculative/Production Change at 2024-03-13T19:23:57.200
greg7mdp commented 7 months ago

I looked at the issue some, but this is not an area I'm familiar with, it is hard for me to pinpoint exactly what is wrong. In addition, making changes to calculate_next_block_slot could have serious consequences for block producers (possibly missing slots), so I feel this should be investigated by someone with more knowledge in this area.

greg7mdp commented 7 months ago

I have verified that the same behavior is seen in the test which doesn't activate IF, so this is not a regression dues to the Savanna consensus.

Running the test locally as: ./tests/trx_finality_status_forked_test.py "-v" "--keep-logs"

One odd observation is that I see the test failing randomly. Seems to fail about every other run.

This doesn't seem to happen when running the test of the main branch instead of hotstuff_integration, but the test has changed in hotstuff_integration, for example using 4 producers instead of tw.

heifner commented 3 months ago

Resolved by: https://github.com/AntelopeIO/spring/pull/473