Closed norswap closed 1 year ago
Still potentially relevant: optimized shuffling approaches.
Some preliminary discovery: so I played around with hashing the whole array instead of computing merkle root (though I haven't tried byte packing). For the DrawHand
circuit, it reduces the constraints from 232k to 122k (yay!), but for Draw
circuit, it actually increases the constraints from 75k to 233k (sadz). The reason for that is that we remove the CheckMerkleRoot
template from the DrawHand
circuit, which greatly reduces the number of hashes needed. However, for the Draw
circuit, initially it is already quite optimized since we only need to do one pass of hashes from selected leaf to the merkle root to check correctness, whereas now we actually need to do more hashes.
There are still some area for optimization for the circuits, including trying out the byte packing approach, just writing my findings here to show that hashing array instead of computing merkle root might be more expensive for Draw
and Play
circuits, though greatly optimized the DrawHand
circuit.
Code changes: https://github.com/norswap/0xFable/compare/master...byte-packing
Nice!
I'm not super surprised by the numbers: if you think about it, the DrawHand circuit did ~63 hashes of two field elements each, and now we're doing one hash of 64 elements, so about 50% reduction seems about right. Packing the bytes means we need to do one hash of only 4 elements (salt + 3 elements to pack 64 cards), so looking forward to that :p
Hopefully, the same logic applies to the Draw circuit where again instead of checking log2(64)=6 hashes of two fields elements for every Merkel branch, we get to do one hash of 4 elements for the each of (oldHand, newHand, oldDeck, newDeck) — (I think we had to check more than one Merkle proof for each of those on average, so hopefully we end up with a win!).
Our previous approach to circuits works, but has really high proving time when compiled to wasm (proving a Merkle root computed over a 64 field element arrays takes 60+s and requires downloading a 3GB zkey to the client).
Fortunately, we think it's possible to do much better:
However, doing this means we need to change the circuits to do a combination of indexing and shifts instead of straightforward indexing.
We're also investigating some other optimizations.
Assigning this to Kai Jun as he's likely to rewrite this in Circom, but the investigation here will be relevant for the direction of https://github.com/norswap/0xFable/issues/48 and https://github.com/norswap/0xFable/issues/49