ethereum / research

MIT License
1.82k stars 591 forks source link

Shuffling: add F+tree sampler #98

Closed mratsim closed 5 years ago

mratsim commented 5 years ago

This adds another shuffling alternative that was taken from natural language processing.

Papers:

A New Data Structure for Cumulative Frequency Tables

The algorithm is also described in this blog post Heaps for incremental computation with an alternative Python implementation using 1-based indexing (and 2x more memory if n is a power of 2).

Bench on my machine:

Testing prime shuffle
[40388, 24854, 44555, 69180, 37292, 27818, 85124, 51675, 75163, 16592]
[40388, 24854, 44555, 69180, 37292, 27818, 85124, 51675, 75163, 16592]
Total runtime:  1.651440143585205
Runtime to compute committee:  0.02932882308959961

Testing feistel shuffle
[82855, 3100, 89704, 87662, 7830, 16014, 57626, 95313, 53632, 97853]
[82855, 3100, 89704, 87662, 7830, 16014, 57626, 95313, 53632, 97853]
Total runtime:  0.15117597579956055
Runtime to compute committee:  0.005218982696533203

Testing Fisher-Yates shuffle
[93044, 39644, 88989, 78137, 17662, 17187, 41433, 85069, 64061, 6647]
Total runtime:  0.1256880760192871

Testing F+tree sampling
[93044, 39644, 88989, 78137, 17662, 17187, 41433, 85069, 64061, 6647]
[50405, 51879, 54080, 30514, 76290, 60097, 83697, 19454, 68194, 85273]
[75628, 50172, 45023, 21349, 80036, 81698, 2829, 994, 88511, 20356]
Total runtime:  1.5932817459106445
Runtime to compute first committee:  0.053829193115234375
Runtime to compute next committee:  0.008832931518554688
vbuterin commented 5 years ago

Not sure I understand; how does this accomplish the property that you can compute any arbitrary value in the shuffling in O(1) time? This seems like an incremental approach that requires computing the 1...n'th values before you can compute the n+1'th.

mratsim commented 5 years ago

That was actually sparked when you were looking into the prime shuffle but it was quite slow. So I researched alternatives. However the Feistel shuffle that was proposed within the same day definitely supersedes this one.