Closed vbuterin closed 5 years ago
What are the security properties that are needed?
The main bottleneck in the current Fisher-Yates
shuffling is the repeated modulo
in the loop as it's an expensive operation. In the new proposed shuffling there are gcd
, pow
and modulo
.
In my opinion the main advantage of Fisher-Yates is the ability to do it in-place, however I suppose to be able to rollback state, most clients will not implement it in-place but in a temporary location instead.
In case we need a temporary location, I think reservoir sampling will be superior in terms of performance while preserving equi-probabilities of all sequences.
The algorithm is very simple and doesn't involve expensive operation, from Wikipedia (array indexing starts at one)
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]
We only need to couple it with a prng. For example XorShift128+ is used by all major browsers (and could be a eWASM primitive):
#include <stdint.h>
/* The state must be seeded so that it is not all zero */
uint64_t s[2];
uint64_t xorshift128plus(void) {
uint64_t x = s[0];
uint64_t const y = s[1];
s[0] = y;
x ^= x << 23; // a
s[1] = x ^ y ^ (x >> 17) ^ (y >> 26); // b, c
return s[1] + y;
}
Or the latest Xoroshiro128+ by the same author, which is significantly faster and with much better statistical quality.
Implementation in Nim (disclaimer this is the default Nim PRNG):
type
Rand* = object ## State of the random number generator.
## The procs that use the default state
## are **not** thread-safe!
a0, a1: ui
var state = Rand( # <---- We can use our own seed instead
a0: 0x69B4C98CB8530805u64,
a1: 0xFED1DD3004688D67CAu64)
proc rotl(x, k: ui): ui =
result = (x shl k) or (x shr (ui(64) - k))
proc next*(r: var Rand): uint64 =
## Uses the state to compute a new ``uint64`` random number.
let s0 = r.a0
var s1 = r.a1
result = s0 + s1
s1 = s1 xor s0
r.a0 = rotl(s0, 55) xor s1 xor (s1 shl 14) # a, b
r.a1 = rotl(s1, 36) # c
This is much more hardware friendly.
Like the current Fisher-Yates, you need to keep a state.
I've been thinking over this and unfortunately reservoir sampling wouldn't work as is.
We need a reservoir that hold the final selection, so it must be the size of the list to shuffle, i.e. we would start with the list and then we remove (~sample) elements randomly from it, so basically it just degrade into random sampling without replacement. However when doing random sampling the data structure used is quite important and a list/vector is not the most efficient one (in my benchmarks a Fenwick tree is very efficient) but if we need to reorganize into a new data structure we might as well do copy+Fisher Yates.
An alternative scheme would be to have a smaller reservoir (say 8 items) and add items ejected from the reservoir to the final list. Unfortunately even if we start from a random point in the original list, the order would be correlated to the original order.
So in short I think reservoir sampling is not suitable.
The main desired property is being maximally close to a random permutation. Specifically, it should be maximally computationally difficult to find seeds that ensure that in a shuffling of 1...N
, any specific subset x1, x2, x3.... x[k]
are in a specific range [v ..... v+k]
.
The reason why the number-theoretic shuffle above is interesting is that you don't need to compute the whole output to compute a small portion of the output. This makes it much easier to use in light clients, and to verify blocks generally.
I just optimized the code a bit; it can shuffle 100000 values in one second (vs 1 million in one second for the Fisher-Yates shuffle). That said, the hash complexity is smaller, so I suspect in an optimized C implementation the number-theoretic shuffle will not be much smaller than out current Fisher-Yates shuffle, as in python the hashes are optimized but the rest of the code is not, leading to hashing costs being understated.
Here is a hash-based alternative. Here's the core:
def numhash(x, i, seed, modulus):
assert 0 <= i < 4
return (int.from_bytes(hash(x.to_bytes(32, 'big') + seed), 'big') // modulus**i) % modulus
def feistel(x, modulus, seed):
assert is_perfect_square(modulus) and modulus < 2**50
h = int(modulus ** 0.5)
L, R = x//h, x%h
for i in range(4):
new_R = (L + numhash(R, i, seed, h)) % h
L = R
R = new_R
return L * h + R
Now for the actual function, let modulus
be the smallest perfect square greater than or equal to n
. To permute some value
, set value = feistel(value, modulus, seed)
and if needed repeat until the result is less than n
, just as above. This takes 4 hashes to compute for a single number, but to permute an entire list you only need ~sqrt(n)
hashes.
I implemented both options and the status quo here: https://github.com/ethereum/research/tree/master/shuffling
Here's the timing test output, testing both computing a full shuffle of 100000 and computing just a specific sub-committee of 500 out of the 100000:
Testing prime shuffle
[40388, 24854, 44555, 69180, 37292, 27818, 85124, 51675, 75163, 16592]
[40388, 24854, 44555, 69180, 37292, 27818, 85124, 51675, 75163, 16592]
Total runtime: 1.0141525268554688
Runtime to compute committee: 0.022953271865844727
Testing feistel shuffle
[82855, 3100, 89704, 87662, 7830, 16014, 57626, 95313, 53632, 97853]
[82855, 3100, 89704, 87662, 7830, 16014, 57626, 95313, 53632, 97853]
Total runtime: 0.10488557815551758
Runtime to compute committee: 0.00408935546875
Testing Fisher-Yates shuffle (status quo)
[93044, 39644, 88989, 78137, 17662, 17187, 41433, 85069, 64061, 6647]
Total runtime: 0.08116364479064941
Feistel shuffling is impressively fast.
I've transformed the current state-of-the-art sampling technique from natural language processing into a shuffling algorithm in https://github.com/ethereum/research/pull/98.
Originally it was for weighted sampling hence why it is rearranging input to efficiently check cumulative probability distributions.
There is an initialisation cost but further sampling are 6.5x cheaper.
Testing prime shuffle
[40388, 24854, 44555, 69180, 37292, 27818, 85124, 51675, 75163, 16592]
[40388, 24854, 44555, 69180, 37292, 27818, 85124, 51675, 75163, 16592]
Total runtime: 1.651440143585205
Runtime to compute committee: 0.02932882308959961
Testing feistel shuffle
[82855, 3100, 89704, 87662, 7830, 16014, 57626, 95313, 53632, 97853]
[82855, 3100, 89704, 87662, 7830, 16014, 57626, 95313, 53632, 97853]
Total runtime: 0.15117597579956055
Runtime to compute committee: 0.005218982696533203
Testing Fisher-Yates shuffle
[93044, 39644, 88989, 78137, 17662, 17187, 41433, 85069, 64061, 6647]
Total runtime: 0.1256880760192871
Testing F+tree sampling
[93044, 39644, 88989, 78137, 17662, 17187, 41433, 85069, 64061, 6647]
[50405, 51879, 54080, 30514, 76290, 60097, 83697, 19454, 68194, 85273]
[75628, 50172, 45023, 21349, 80036, 81698, 2829, 994, 88511, 20356]
Total runtime: 1.5932817459106445
Runtime to compute first committee: 0.053829193115234375
Runtime to compute next committee: 0.008832931518554688
Benedikt Bunz points to this paper as a potential candidate.
Is it certain that the present shuffling approach is going to be ditched?
@drozdziak1 I'd say 80% :) Light-client friendly shuffles seem like a big win, and they are possible. It's now an engineering/cryptography question to find a really efficient one, maybe even a SNARK/STARK-friendly one.
It sounds like a CSPRNG is a requirement. Anything less just seems sketchy when deliberate attacks are expected. As they say, attacks only get better
@vbuterin have you run some random number tests (e.g. dieharder) against the primenumber implementation or is this algorithm some well-known CSPRNG?
closed by adding "swap or not" #563
Motivation
Construct a shuffling algorithm where you can compute the value in the shuffled list at any specific position relatively cheaply without computing all of the other values at the same time. This could be used to replace the current
shuffle
.Helpers
Algorithm
Note that the above is a maximally simple definition, not an optimal implementation. An optimal implementation would calculate the prime, the exponent, and the
a
values once, and pass them as inputs tovalue_at_position
, which would then need to do much less work per value.The main weaknesses are: (i) the computation of the total shuffle is slower even with optimizations, and (ii) the security argument is more heuristic ("it repeats
x -> x**3 + k[i]
so it's like MIMC") than the current Fisher-Yates shuffle. However, the strengths may be worth it, especially if we wish to cease storing the shuffling in the state.