Closed ljedrz closed 6 months ago
I returned the original Mul
code for now to reduce the scope of changes; the most important change is the fact that we avoid cloning the LinearCombination
.
As requested, I added a benchmark for Field::mul_assign
, which is the operation affected by this PR; these are the results, and they are quite consistent:
Field::mul_assign time: [43.078 ns 43.184 ns 43.301 ns]
change: [-33.164% -32.969% -32.787%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
@ljedrz does this change any circuit constructions?
Can you please open this on AleoHQ/snarkVM
first for thorough review prior to introducing it on AleoNet
?
@ljedrz does this change any circuit constructions?
I believe this does not change circuit constructions, and also think our tests will catch it if it does. Mike and Alessandro should independently review, we should burn-in.
@ljedrz given how understandably concerned this makes everyone for "just" a ~6% improvement, I'd argue let's stall this PR until post-mainnet.
Closing this PR in order to have it reviewed in AleoHQ
first :+1:.
This was something that I found while heap-profiling a large deployment and execution;
Poseidon::hash_many
is prominent, and responsible for many allocations. Most of these are caused by thepow
operations involved, and by adjusting one of their internals (switching fromMul
toMulAssign
specifically), we can avoid some of them. The preexistingMul
operation is also adjusted for a small improvement.In addition, the
new_state
allocation used inapply_mds
can be reused, and its contents can be swapped with - instead of cloned to - thestate
.These changes result in a decrease in allocs/s by ~6% (likely more if adjusted for node startup) in a
--dev
node during a large deployment+execution. In addition, the newmul_assign
operation is ~36% faster than themul
one, which will positively impact node performance.Note: we currently don't seem to have a benchmark for
Field
multiplication (I tested it by introducing an ad-hoc benchmark forMulAssign<&Field<E>> for Field<E>
); I can add one or more of those if we'd like.