Closed vbuterin closed 4 years ago
Very interesting problem. I would love to think about these. How final are these ideas? Could there be other schemes out there and would we be willing to consider them at this point?
One question about https://arxiv.org/pdf/1809.09044.pdf (is this a good reference?), section 7.1: I am currently thinking about polynomial and functional commitment schemes. Is it possible that such a scheme could be used, rather than a full zk-SNARK (which is currently prohibitively expensive)?
I am currently looking at some polynomial commitment schemes that could potentially replace the proof of custody with a non-interactive version. They aren't quite ready for prime time yet (still some problems with efficiency and outsourcability), but I'm planning to post my thoughts to get some more outside thoughts about it.
In terms of how much is final and how much can be changed, I'd say there's good arguments for the use of erasure coding and commitment to erasure coded data in some form being optimal:
As far as choice of scheme goes, I think biggest pragmatic reason to stick to hash-based things in this case, aside from simplicity of implementation, future-proofness and all the other usual reasons, is that erasure codes and Merkle trees are really fast to generate (eg. I can do it within an epoch for 512 MB of data in python) but I don't see any more complicated scheme having the same property (as precedent, think of how the proving time for STARKs is 10x faster than for SNARKs). Additionally purely hash-based schemes can work over binary fields and one reason binary fields are especially nice is that they can represent 32-byte chunks with no overhead, whereas any finite field runs the risk that even if a block is entirely composed of values < 2**256
, the extended chunks will not be (the same goes for any byte size less than 32).
Though it does seem to me like polynomial commitment schemes are a superior alternative to SNARKs over Merkle trees; they're more "direct", rather than one construction wrapping over a completely foreign one and hence incurring arithmetization inefficiencies. And the assumptions in polynomial commitment schemes are definitely milder than the assumptions in SNARKs.
Block layout in commitment data is still very flexible, and my intuition is that whatever efficiencies remain would remain the same regardless of the scheme used.
Idea from a conversation with Justin yesterday: if there are large segments of contiguous zero bytes, fill them with any of:
All of these objects are larger than 64kb, so we can either randomly choose indices from where to include data each time, or store indices and try to cycle through the data.
I actually like this. There's definitely something unclean about "grab as many bytes of this thing as possible", but it allows us to preserve the existing (quite clean!) structure of the crosslink data.
@vbuterin @dankrad Can we close this issue for staleness? (Trying to address old Github issues, and data availability research has progressed significantly since June 2019.)
I'd say it can be closed. I would say the data availability proof now will mostly follow from whatever we use for shard data commitments.
On Tue, 21 Apr 2020 at 13:38, Justin notifications@github.com wrote:
@vbuterin https://github.com/vbuterin @dankrad https://github.com/dankrad Can we close this issue for staleness? (Trying to address old Github issue, and data availability research has progressed significantly since June 2019.)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ethereum/eth2.0-specs/issues/1194#issuecomment-617153307, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOYXL56QI2SPZCOFUN74K3RNWHWHANCNFSM4HZE7KNQ .
--
Dr. Dankrad FeistResearcher at Ethereum Foundation
dankrad@ethereum.org +44 750 3267828
There are three major sources of inefficiency in the current construction for data availability proofs:
2^k * 2^k
square, so data needs to be padded to the next power of 4. This may on average double the amount of data that needs to be erasure-coded and stored by the network.Possible ideas to improve efficiency:
2^k * 2^k
requirement and replace it with an arbitaryn * n
(not sure this is actually an efficiency improvement; FFTs are easiest to use when handling exact powers of two)2^k * 2^(k-1)
rectangles.This is still in "think mode"; better ideas may come up. So far (3) seems least bad, plus possibly (7). The issues are far from fatal, but hopefully there are easy ways to remove at least some of the current theoretical ~8x inefficiency.