dtr-org / unit-e

A digital currency for a new era of decentralized trust
https://unit-e.io
MIT License
45 stars 15 forks source link

PoSv3 Time-Drift related problems #817

Open castarco opened 5 years ago

castarco commented 5 years ago

First, some well known facts:

Now, the issues:

Some considerations:

This issues are not just theoretical:

This is how the block timestamps are shifted relative to the "real" time with 600s time windows, 16s between blocks & 4s time mask. This is an extreme case because the greedy nodes started with a relatively huge stake, and the naive nodes were "kicked out" because their timestamps were also behind the past median time, so take it with a grain of salt (I just recovered the image from my first experiments). shifted_timestmps

The next picture is a more realistic case (extracted from a 48h simulation), using our settings (but with a 10 minutes time window, negative values mean that the timestamp is shifted to the future in this plot): shifted_timestmps2

Proposed mitigations:

  1. Having small time windows (in the seconds or minutes scale). This is the most obvious one, proposed also by other colleagues and applied in other projects.

  2. Change kernel hash calculation:

    • All the nodes are greedy by default, so they perform a limited version of PoW.
    • We recover the concept of nonce, with the following considerations:
    • We don't use the timestamp for the kernel hash.
    • We use the nonce for the kernel hash.
    • We apply the restrictions that we were applying to the timestamp to the nonce (The nonce is constrained by the upper time boundary & by the past median block time, so the nonce would be similar to a timestamp, but noisier).
    • Summarizing: the basic idea is to remove incentives to tweak the block time, so we could have some "drift" on the nonce, but there's no rational reason why the nodes would tweak the timestamps (the next kernel hash won't depend on the past block time, but on the median of many, and luckily, the median is quite insensitive to what happens at the boundaries of the distribution).

Notice that these mitigations are not a complete solution. Changing how the kernel hash is computed helps with the noisy timestamps, but still presents the problem of having a "probability peak" centered at the block creation + the average block propagation time (a new block is created, is propagated, and then, all the proposers compute a ton of hashes immediately, then they just compute 1 hash per unit of time after that).

There are also potential mitigations for this last problem, but they wouldn't be part of the consensus rules. Basically, the proposers could compute the kernel hashes for a subset of their competitors, and if they see that their competitors can't propose this round, then they can wait a little bit to relay their new block without the (much) risk of losing the race. The competitors subset could be obtained by using the last N rewards and/or combined stake outputs, where N should be relatively large.

thothd commented 5 years ago

In general fo security issues, I suggest to create a branch with a functional test in an active PoSv3 coin repository (Qtum is a good candidate, maybe Particl) as @amiller is usually doing. Mostly since it requires orchestrating few things, bypassing validations etc, otherwise it's very hard to asses it. It's probably even better to create such an issue on the other running repositories first then maybe refer to it, since then you can show the test attack results and specific times - e.g 2 hours which I couldn't find by the way in Particl ?

castarco commented 5 years ago

since then you can show the test attack results and specific times - e.g 2 hours which I couldn't find by the way in Particl ?

I probably misread some parameter, looked at the wrong place, or looked at old versions. Anyway, the problem is clearly related to how much drift we allow, decreasing the allowed drift is a very effective mitigation.