Open HarryR opened 6 years ago
After some testing it seems the disk load of the proving key is nearly instant, however when using the standard ostream
and istream
serialisation it takes a significant amount of time to read from the stringstream into the object.
Part of the problem is the BINARY_OUTPUT
flag in bigint.cc
makes the difference between loading from a string representation versus a binary representation. However, this flag already seems to be enabled by default!
I think if this could be sped up then the prove / verify time would be reduced significantly, and more importantly in relation to the number of circuits. How can it take 1 minute (after loading all the data from disk) to parse 2.5m lines? With a larger number of circuits this will take even longer...
So:
alt_bn128_q_limbs
is (alt_bn128_q_bitcoin+GMP_NUMB_BITS-1)/GMP_NUMB_BITS
bigint<n>
has mp_limb_t data[n]
Fp_model
has bigint<n> mont_repr
where n
is alt_bn128_q_limbs
(for alt_bn128)alt_bn128_Fq
is Fp_model<alt_bn128_q_limbs, alt_bn128_modulus_q>
alt_bn128_g1
has X
, Y
and Z
which are alt_bn128_Fq
types.None of the classes above have any other variables other than those noted.
So, assuming tight packing, an alt_bn128_g1
point should be sizeof(mp_limb_t) * n * 3
memory. Similarly alt_bn128_g2
should be sizeof(mp_limb_t) * n * 6
. This means that a reinterpret_cast<alt_bn128_g1>
against mp_limb_t[n*3]
should be usable as an alt_bn128_g1
object without problems.
If these assumptions are true, then it should be possible to dump the proving key in a way which can be mmap
'd. The problem is that, when doing this, the compiler may adjust for alignment (assuming mp_limb_t is native ptr size, it shouldn't), the problem is that {A,B,C}_query
are of Boost::sparse_vector
type, and {H,K}_query
are of std::vector
type.
Need to investigate the density of the {A,B,C}_query
sparse vectors
With the LongsightF gadget this seems like less of a priority as the serialised circuit is quite small compared to the full SHA256 merkle tree proof gadget.
See:
There are optimisations which can be made which reduce the necessary amount of in-memory data required for the proving key.
1) Don't use Jacobian coordinates for the proving key while it's in-memory (removes the X coord), allocate X coord on stack when performing calculation 2) Use a directly accessible format which allows the proving key to be
mmap
'd and let the kernel figure it out. 3) Anything which is used independently and sequentially means the rest can be discarded:Load each one at a time, rather than all of them.
Either way, need to create a benchmark for
prove
which tracks the peak memory usage.Any changes should avoid modifying
libsnark
directly, we still want to use it as a submodule.With the
mmap
approach, if the offsets & sizes of sections for A, B, C, K and H are known ahead of time, thenmadvise
can be used depending on the access pattern. If it's purely sequential thenMADV_SEQUENTIAL
can be used, however if it's at random then doing a pre-emptive sequential read into memory will offer a performance improvement.Either way - disk reads vs memory usage trade-off again.