Closed bigspider closed 2 weeks ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 84.76%. Comparing base (
5499c3e
) to head (5f6afe9
). Report is 4 commits behind head on develop.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Failed conditions
24.6% Coverage on New Code (required ≥ 80%)
This PR improves signing performance in two ways.
The total time of the pre-signing transaction checks is dominated by the
secp256k1_point
, which is used inbip32_CKDpub
(BIP-32 unhardened pubkey derivation). Thecx_ecfp_scalar_mult_no_throw
function is hardened against side channels, but this has a large performance impact. Commit ab344cdf588880ece140c33e65ca465f7f3230e1 implements the unsafe counterpartcx_ecfp_scalar_mult_unsafe
, which is not hardened. This would be problematic in cryptographic code using secrets, but it's safe in functions that only work with public data likebip32_CKDpub
andcrypto_tr_tweak_pubkey
.All the inputs of a transaction (and change outputs, as well) spending from a BIP-388 wallet policy have keys derived from the root keys of the policy with two derivation steps; the first is typically 0 or 1, while the second is arbitrary (the address_index). Therefore, the
root_key/0
or theroot_key/1
step ends up being derived viabip32_CKDpub
multiple times. Commit 711436955f3ad135daf2629dd0626c981f66cfd9 adds a cache for the result of the first derivation step, avoiding recomputing it over and over. For a transaction with a large number of inputs, this potentially cuts the number of calls tobip32_CKDpub
by (almost) a half. Caching is only used for the first 16 key expressions in the descriptor template of the wallet policy - probably more than would ever be needed in practice. This bounds the size of the cache to about 2.5 kb of stack. It's unlikely that transactions not fitting the cache would be used in practice, but anyway the result would only be reduced performance for those extra keys, not benefiting from the cache.Benchmarks below show the result of the improvements on the transactions from the performance test suite. All transactions have an improvement of at least 0.8s, and some very large transactions with many inputs and complex scripts have the total running time cut by over 50%.
Initial exploration suggests that significant improvements are still possible by using an implementation of
secp256k1_point_unsafe
that takes advantage that the point is fixed (the generator of the secp256k1 curve), instead of using the generic scalar multiplication. Precomputed tables and windowed scalar multiplication are probably the lower hanging fruit with the highest ROI. This is left for a future PR.Benchmarks
All benchmarks are executed on a Nano S+.
Baseline
Benchmarks before the changes.
Using
secp256k1_point_unsafe
Adding the cache for repeated derivations
Negligible impact for small transactions, but very substantial for transactions with many inputs.