[Perf] Faster Poseidon permutation and Field multiplication

ljedrz commented 6 months ago

This was something that I found while heap-profiling a large deployment and execution; Poseidon::hash_many is prominent, and responsible for many allocations. Most of these are caused by the pow operations involved, and by adjusting one of their internals (switching from Mul to MulAssign specifically), we can avoid some of them. The preexisting Mul operation is also adjusted for a small improvement.

In addition, the new_state allocation used in apply_mds can be reused, and its contents can be swapped with - instead of cloned to - the state.

These changes result in a decrease in allocs/s by ~6% (likely more if adjusted for node startup) in a --dev node during a large deployment+execution. In addition, the new mul_assign operation is ~36% faster than the mul one, which will positively impact node performance.

Note: we currently don't seem to have a benchmark for Field multiplication (I tested it by introducing an ad-hoc benchmark for MulAssign<&Field<E>> for Field<E>); I can add one or more of those if we'd like.

ljedrz commented 6 months ago

I returned the original Mul code for now to reduce the scope of changes; the most important change is the fact that we avoid cloning the LinearCombination.

ljedrz commented 6 months ago

As requested, I added a benchmark for Field::mul_assign, which is the operation affected by this PR; these are the results, and they are quite consistent:

Field::mul_assign       time:   [43.078 ns 43.184 ns 43.301 ns]
                        change: [-33.164% -32.969% -32.787%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

howardwu commented 6 months ago

@ljedrz does this change any circuit constructions?

Can you please open this on AleoHQ/snarkVM first for thorough review prior to introducing it on AleoNet?

vicsn commented 6 months ago

@ljedrz does this change any circuit constructions?

I believe this does not change circuit constructions, and also think our tests will catch it if it does. Mike and Alessandro should independently review, we should burn-in.

@ljedrz given how understandably concerned this makes everyone for "just" a ~6% improvement, I'd argue let's stall this PR until post-mainnet.

ljedrz commented 6 months ago

Closing this PR in order to have it reviewed in AleoHQ first :+1:.

AleoNet / snarkVM

[Perf] Faster Poseidon permutation and Field multiplication #2460