mining thread panics due to incorrect difficulty

I noticed that the block explorer neptune-core instance (running regtest) stops generating new blocks at times and must be restarted to resume.

The log shows a warning that the new difficulty is incorrect, and then a panic, presumably from a failed assertion.

needs review.

2024-05-17T15:06:28.378643326Z  INFO ThreadId(60) neptune_core::mine_loop: Found valid block with nonce: (17615783623759500440, 14208631580706320865, 11881036064546879572).
2024-05-17T15:06:28.378750458Z  INFO ThreadId(60) neptune_core::mine_loop: Newly mined block details:
              Height: 1649
              Time:   2024-05-17T15:06:28.378+00:00 (1715958388378)
        Digest (Hex): 5e5a5a4b20a46b38fa72b6139e9e3b6d7d550e406734ddf899d562439ea6b35f1f712bac8fd00200
        Digest (Raw): 04065523547242125918,07871059175532819194,17932546909320140157,06896038653099955609,00000792265440850207
Difficulty threshold: 08183520607973278836,17399026898151569383,09772401002831604792,08450955921959333700,00003146297811600645

2024-05-17T15:06:28.378914267Z  WARN ThreadId(02) neptune_core::models::blockchain::block: Value for new difficulty is incorrect.
thread 'tokio-runtime-worker' panicked at /home/neptune-explorer/neptune-core/src/mine_loop.rs:471:17:
Own mined block must be valid. Failed validity check after successful PoW check.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

note: neptune-core was built at 5ef2423677e387387ba6b3cfb7c0ffc58f57c913

Some things that may or may not be related:

it's been reported that difficulty and block creation times displayed in the block explorer are highly variable. quote: "Is the difficulty so drastically different & the block creation times so variably spaced because the regtest is only mined on when you tell your machine to do so?"
144.

Ok, I wanted more concrete evidence that something is wrong (or not) with the difficulty adjustment and block times, so I wrote a couple of quick php scripts that (a) download info for all blocks from explorer.neptune.cash and (b) statistically summarize the information in terms of: 1) difficulty, 2) duration between blocks, and 3) expected blocks for the entire time period.

I then wanted to compare the results with bitcoin for a similar number of blocks, so I modified the scripts to do the same for bitcoin.

In summary, the neptune block times are both too high and too low and there are difficulty anomalies. Here are the stats for neptune and bitcoin, side by side.

/// NEPTUNE ///                               /// BITCOIN ///

-- Expected blocks For Time Period --         -- Expected blocks For Time Period --
    elapsed time for all blocks: 1947688          elapsed time for all blocks: 949393

    expected_blocks: 3246.1466666667              expected_blocks: 1582.3216666667
    actual_blocks: 1645                           actual_blocks: 1653
        difference: 1601.1466666667 blocks            difference: 70.678333333333 blocks

    avg_time_per_block: 1184.0048632219           avg_time_per_block: 574.34543254688
    expected_time_per_block: 600                  expected_time_per_block: 600
        difference: 584.00486322188 seconds           difference: 25.654567453116 seconds

-- Duration Stats --                          -- Duration Stats --
{                                             {
    "mean": 827.8842169408897,                    "mean": 574.6930992736078,
    "median": 410,                                "median": 385,
    "mode": [                                     "mode": [
        0                                             51,
                                                      31,
                                                      287
    ],                                            ],
    "min": 0,                                     "min": -137,
    "max": 9492,                                  "max": 4209,
    "count": 1641,                                "count": 1652,
    "unique": 964,                                "unique": 955,
    "sum": 1358558,                               "sum": 949393,
    "range": 9492,                                "range": 4346,
    "variance": 1796444.1332507045,               "variance": 462256.04706113663,
    "std_dev": 1340.3149380838463,                "std_dev": 679.8941440115046,
    "quartiles": [                                "quartiles": [
        0,                                            -137,
        35,                                           170,
        410,                                          385,
        1158,                                         795,
        9492                                          4209
    ],                                            ],
    "interquartile_range": 1123                   "interquartile_range": 625
}                                             }

-- Difficulty Stats --                        -- Difficulty Stats --
{                                             {
    "mean": 8251.9726443769,                      "mean": 83907017870294.7,
    "median": 8500,                               "median": 84381461788831,
    "mode": [                                     "mode": [
        2                                             84381461788831
    ],                                            ],
    "min": 2,                                     "min": 83148355189239,
    "max": 29576,                                 "max": 84381461788831,
    "count": 1645,                                "count": 1653,
    "unique": 1106,                               "unique": 2,
    "sum": 13574495,                              "sum": 138698300539597131,
    "range": 29574,                               "range": 1233106599592,
    "variance": 36043355.13309846,                "variance": 8.006660954627039e+23,
    "std_dev": 6003.6118406421365,                "std_dev": 894799472207.4348,
    "quartiles": [                                "quartiles": [
        2,                                            83148355189239,
        2315,                                         83148355189239,
        8500,                                         84381461788831,
        12253.5,                                      84381461788831,
        29576                                         84381461788831
    ],                                            ],
    "interquartile_range": 9938.5                 "interquartile_range": 1233106599592
}                                             }

There are several things to note here:

The theoretical mean duration for both neptune and bitcoin should be 600 seconds. Bitcoin is nearish at 574 while neptune is further off at 827.
Block generation is a Poisson process and resulting block durations are an exponential distribution [1]. In theory, the std_deviation for an exponential distribution should equal the mean. For the bitcoin results, the stddev is nearish to mean, but for neptune it is more than double.
The duration range is more than twice that of bitcoin and the sum is much higher for a similar number of blocks.
The mode for neptune duration is 0, meaning that the most common duration between blocks is 0 seconds. I put in some debug println about this, and found that this is almost always occurring when block A has a difficulty of 2, and is then followed right away by blocks B and C, with timestamps within 1 second of eachother.
likewise the mode for neptune difficulty is 2, which is super low. Why is it the most common difficulty?

regarding methodology:

This is a single neptune-core node running regtest and mining all by itself, with no peers.
neptune-core is running in the default "restricted mining" mode the entire time period. This means that it sleeps between each mining loop iteration. I don't believe that should cause these kinds of behaviors but have not tested with unrestricted-mining flag to rule it out. If that should turn out to be the cause, I'd like to understand why/how.
neptune-core was restarted a few times during the time period.
I started with block 3 for neptune because 1 and 2 typically are found immediately.
I omitted some blocks after mining was known to pause/freeze due to bad-difficulty error.
I omitted a few super-high outliers from the durations as they may have been due to the bad-difficulty bug, followed by a restart.
I ran the stats against a 300 block subset of the data and got a similar duration mean, and a lower (but still too high), std_dev

Here are the php scripts used to generate the stats. This also includes raw data for the neptune blocks.
difficulty_analysis.tar.gz

[1] https://bitcoin.stackexchange.com/questions/25293/probablity-distribution-of-mining

Thanks for the elaborate research. Instead of trying to formulate a well rounded conclusion, let me just put words to my consciousness as it rapidly jumps around.

Bitcoin mining is a Poisson process resulting in an exponential distribution. Neptune is supposed to be a Poisson process resulting in an exponential distribution as well. Any deviation from this objective is worthy of investigation and fixing.

That said, the mechanism by which the difficulty is updated differs dramatically. Bitcoin, if memory serves, calculates the average block time of the last 2016 blocks and adjusts the difficulty (within clamping bounds) based on that. Neptune only looks at the previous block, and computes the new difficulty based on a linear process control mechanism (in fact, it is the dumbest possible control mechanism: PID with I=D=0), with the exception that the new difficulty is lower-bounded by 2. The rate at which the difficulty adjusts to the hash power of the network is parameter that can be tuned to fit the desired results.

You mention that 2 is the most common difficulty for Neptune. That's because the blocks are found too far apart and the difficulty control mechanism wants to set it even lower, but the lower bound clamps it up to 2.

I started with block 3 for neptune because 1 and 2 typically are found immediately.

Meaning that, the difficulty has adjusted to the hash power after essentially two blocks? The relevant counterpart is the 2016 blocks it takes for Bitcoin's difficulty to adjust, and even then it might not have caught up to the hash power.

I wonder whether running the Neptune regtest on a single machine skews the results. In principle, even a single-threaded miner is a Poisson process, so intuitively the answer is "no". That if the node crashes from time to time, that is a behavior that deviates from the Poisson process, and that has the potential to bias the results. But if you run the network on many machines that can each crash independently, then you even out the disturbances.

It should be possible to rapidly generate these statistics for Neptune based on the current set of control parameters. I say "rapidly" because you would be sampling the next block time based on the current exponential distribution -- as opposed to running the proof-of-work algorithm. I think that's a worthwhile unit test and will provide valuable information to update the control parameters.

This test case could also serve as a basis for analyzing ahead of time the robustness of the network against mining attacks. What if the hash power increases 100x and then decreases to where it started from in the span of one week? How does the network react? Under which conditions is selfish mining profitable?

neptune-core is running in the default "restricted mining" mode the entire time period. This means that it sleeps between each mining loop iteration. I don't believe that should cause these kinds of behaviors but have not tested with unrestricted-mining flag to rule it out. If that should turn out to be the cause, I'd like to understand why/how.

This does skew the results technically speaking, but by a negligible amount as far as I can tell.

many machines that can each crash independently, then you even out the disturbances

right. or just a single machine, sampled for a period of time that we are certain there was no crash or restart. In lieue of certainty I sampled a few smaller subsets and found similar mean durations. Another thing I've thought of doing is to set the target time very small, eg 1 second and run for 10 minutes or so with unrestricted mining. It seems that should generate a pretty good data set quickly. I might give it a try today.

It should be possible to rapidly generate these statistics for Neptune based on the current set of control parameters. I say "rapidly" because you would be sampling the next block time based on the current exponential distribution -- as opposed to running the proof-of-work algorithm. I think that's a worthwhile unit test and will provide valuable information to update the control parameters.

thoughts:

I'm not entirely certain how to go about that, perhaps you can provide a sketch.
iiuc, this would be testing our impl of "the math", without testing the mining loop itself.
intuitively it seems like setting the target time very low, eg 10ms and actually performing the proof-of-work should provide similar results, with benefit that it exercises the mining loop as well. So perhaps we ought to make that a test instead-of, or in addition to.

This test case could also serve as a basis for analyzing ahead of time the robustness of the network against mining attacks. What if the hash power increases 100x and then decreases to where it started from in the span of one week? How does the network react? Under which conditions is selfish mining profitable?

yes I think its important to test and a have a better understanding of these kinds of things.

Well by accident I discovered a way to reliably generate the difficulty error/panic.

I just tried setting the TARGET_BLOCK_INTERVAL in block_header.rs to 1000, ie 1 second.

When I try to mine with regtest from genesis block I repeatedly (3 tries in a row) get the error on the very first block:

2024-05-30T17:28:46.595842743Z  WARN ThreadId(02) neptune_core::models::blockchain::block: Value for new difficulty is incorrect.
thread 'tokio-runtime-worker' panicked at /home/danda/dev/neptune/neptune-core/src/mine_loop.rs:471:17:
Own mined block must be valid. Failed validity check after successful PoW check.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

That's a clear indicator that the error is related to timing/interval calcs.

edit: Thinking the genesis timestamp might play a role, I put in a quick hack to skip difficulty validation for block height == 1. It then mined 4 blocks right away, and hit the error on height == 5. So it seems the 1 second target doesn't trigger it for every block, but probably is somehow a contributing factor.

Hallelujah! I finally found the cause of this panic and have a fix.

It was because of the earlier change that updates the header timestamp in the mining loop in order to have the timestamp reflect the time block is found.

It turns out that the difficulty value depends on the timestamp value, but was not being updated, so it was based on the initial timestamp when mining started. After block is mined, the validation checks the difficulty using the newer timestamp and there is (only sometimes) a mis-match, eg off-by-one.

With the fix in place, I was finally able to mine hundreds of blocks quickly by setting target interval to 1 second.

I will commit the fix for this tmw.

I had made the fix mentioned in https://github.com/Neptune-Crypto/neptune-core/issues/154#issuecomment-2141184650 which ended the panics and seemed to work well, but I wanted to check if the block intervals seemed right or not.

So I added a test, mine_ten_blocks_in_ten_seconds, that sets the block target interval to 1 second, mines ten blocks, and then compares the elapsed time to expected time.

Initially I got these results, running the test case 10 times, indicating blocks are taking longer than expected:

target_interval = 1000
num_blocks = 10

allowed_variance = 1.3
max_duration = 13000
min_duration = 7692

total_time_expected = 10000

total_time_actual, for 10 runs:

1.  14849   fail
2.  15108   fail
3.  17010   fail
4.  20232   fail
5.  15715   fail
6.  11358   pass
7.  11000   pass
8.  24858   fail
9.  17416   fail
10. 13488   fail

After more digging I discovered that the "fix" was incomplete because the mining loop calculates a threshold value from the difficulty and that is what each digest is compared against. This threshold was not being recalculated along with the timestamp and difficulty, so I fixed that.

After this change, the values were better, but now a bit lower than expected.

1.  8548    pass
2.  8787    pass
3.  7317    fail
4.  8716    pass
5.  8675    pass
6.  8956    pass
7.  8300    pass
8.  9330    pass
9.  8090    pass
10. 8757    pass

I modified the test case to ignore blocks 1 and 2 which are normally found very fast. The results now look pretty much as expected:

1.  8598    pass
2.  10919   pass
3.  10378   pass
4.  10088   pass
5.  10112   pass
6.  8608    pass
7.  10311   pass
8.  10611   pass
9.  10248   pass
10. 9501    pass

I ran the test case one final time using the standard target block interval of 9.8 minutes. This also passed within the allowed variance limit of 1.3. The mean interval was 9.19 minutes.

Found block 12 in 109233 milliseconds
actual duration: 5514874
expected duration: 5880000
min_duration: 4523076
max_duration: 7644000
allowed_variance: 1.3

The sample size is small but it looks like we may now be a little bit on the "too fast" side. It might be that block-height 3 is too early to start sampling at and is skewing results a bit.

Anyway, this seems a solid improvement, so I've closed the issue. I will plan to generate stats from the block explorer after it has run for a few days with the new code.

edit: I should mention that in order to make this test work I also modified some functions to accept an optional target_block_interval parameter which enables simulating mining and block validation with custom target intervals, eg 1 second. See 0458d77 and 7fa9c69.

Well this is very encouraging. I just ran the test for longer (nearly 3 hours) and the actual result matched expected to 0.9999 accuracy.

Parameters were:

        let num_blocks = 1000;
        let target_block_interval = 10000;    // 10 seconds

Results

actual duration: 9999376
expected duration: 10000000
min_duration: 7692307
max_duration: 13000000
allowed_variance: 1.3

Neptune-Crypto / neptune-core

mining thread panics due to incorrect difficulty #154

144.