Bug: Block digest exceeds target difficulty / Own mined block did not have valid PoW Discarding.

dan-da commented 3 months ago

On present master (015e4dbf961733672f7a784de6a9d6bd41c3fefa), I am seeing errors like this:

2024-04-07T00:26:42.303501752Z  WARN ThreadId(05) neptune_core::models::blockchain::block: Block digest exceeds target difficulty
2024-04-07T00:26:42.30359635Z ERROR ThreadId(05) neptune_core::mine_loop: Own mined block did not have valid PoW Discarding.
Running consensus program with input: 5652635907697347186,11752900863602063836,16998321362642325654,268261271021766996,7619209565187437062,4272906177519395165,5440351008852116440,6529846807105067338,3094116146312299119,17345176469723076687,12538662492950623376,11774793938361832615,6445610371475286342,14262065177127252459,7827535130222013148

Actually I've been seeing these for a while, but haven't tried to pinpoint the specific commit that introduced it yet.

This may be one for @aszepieniec.

Sword-Smith commented 3 months ago

This is just a race condition, and I'm not sure it qualifies as a bug. The log message should probably be warn and not error.

Multiple race conditions can happen in relation to mining. A block can be received from a peer shortly before it is received from the miner's worker thread. Then this block of code is reached where latest_block could e.g. be set to block with height 1 (the one after genesis block), and the received block from the miner's worker-thread also has a block height of 1:

mine_loop.rs, line 401:

new_block_res = worker_thread_rx => {
                let new_block_info = match new_block_res {
                    Ok(res) => res,
                    Err(err) => {
                        warn!("Mining thread was cancelled prematurely. Got: {}", err);
                        continue;
                    }
                };

In this case, the newly locally mined block might not have a sufficient PoW value, as the PoW value is updated after each block and in this case is calculated from block 1 received from a peer and not from the genesis block, which is the problem the miner's worker thread was tasked with.

Perhaps it would be better to check that the block received from the worker thread is the child of latest_block and not check its PoW value.

This PoW check was most likely added after this sanity check was added to guard against false positives:

assert!(new_block_info.block.is_valid(&latest_block, now), "Own mined block must be valid. Failed validity check after successful PoW check.");

Previously these kind of race conditions made the entire client panic but now the miner (and in some cases the main loop) just discards the found block. It's important that the code continues to be able to handle these race conditions without panicking but its of course even more important the the block chain state stays in a correct and consistent state.

This check in main_loop.rs which is performed under a write-lock on the global state is also very important:

                let block_is_new = tip_proof_of_work_family
                    < new_block.kernel.header.proof_of_work_family
                    && new_block.kernel.header.prev_block_digest == tip_hash;
                if !block_is_new {
                    warn!("Got new block from miner thread that was not child of tip. Discarding.");
                    return Ok(());
                }

This above check protects against the race condition that the miner (miner-worker and miner-master) finds a valid block of height n but another valid block of height n is received from a peer and handled in main_loop before the miner thread handles the information that it needs to mine on a new block.

Sword-Smith commented 2 months ago

I suggest to fix this issue by downgrading the log message from Error to Warn, as that is the level we use other places for race conditions.

Neptune-Crypto / neptune-core

Bug: Block digest exceeds target difficulty / Own mined block did not have valid PoW Discarding. #131