Performance optimizations

bigspider commented 3 weeks ago

This PR improves signing performance in two ways.

The total time of the pre-signing transaction checks is dominated by the secp256k1_point, which is used in bip32_CKDpub (BIP-32 unhardened pubkey derivation). The cx_ecfp_scalar_mult_no_throw function is hardened against side channels, but this has a large performance impact. Commit ab344cdf588880ece140c33e65ca465f7f3230e1 implements the unsafe counterpart cx_ecfp_scalar_mult_unsafe, which is not hardened. This would be problematic in cryptographic code using secrets, but it's safe in functions that only work with public data like bip32_CKDpub and crypto_tr_tweak_pubkey.
All the inputs of a transaction (and change outputs, as well) spending from a BIP-388 wallet policy have keys derived from the root keys of the policy with two derivation steps; the first is typically 0 or 1, while the second is arbitrary (the address_index). Therefore, the root_key/0 or the root_key/1 step ends up being derived via bip32_CKDpub multiple times. Commit 711436955f3ad135daf2629dd0626c981f66cfd9 adds a cache for the result of the first derivation step, avoiding recomputing it over and over. For a transaction with a large number of inputs, this potentially cuts the number of calls to bip32_CKDpub by (almost) a half. Caching is only used for the first 16 key expressions in the descriptor template of the wallet policy - probably more than would ever be needed in practice. This bounds the size of the cache to about 2.5 kb of stack. It's unlikely that transactions not fitting the cache would be used in practice, but anyway the result would only be reduced performance for those extra keys, not benefiting from the cache.

Benchmarks below show the result of the improvements on the transactions from the performance test suite. All transactions have an improvement of at least 0.8s, and some very large transactions with many inputs and complex scripts have the total running time cut by over 50%.

Initial exploration suggests that significant improvements are still possible by using an implementation of secp256k1_point_unsafe that takes advantage that the point is fixed (the generator of the secp256k1 curve), instead of using the generic scalar multiplication. Precomputed tables and windowed scalar multiplication are probably the lower hanging fruit with the highest ROI. This is left for a future PR.

Benchmarks

All benchmarks are executed on a Nano S+.

Baseline

Benchmarks before the changes.

test_perf_sign_psbt_singlesig_wpkh[1]              5.7687 (1.0)        5.7687 (1.0)        5.7687 (1.0)      0.0000 (1.0)        5.7687 (1.0)      0.0000 (1.0)           0;0  0.1734 (1.0)           1           1
test_perf_sign_psbt_singlesig_pkh[1]               6.0501 (1.05)       6.0501 (1.05)       6.0501 (1.05)     0.0000 (1.0)        6.0501 (1.05)     0.0000 (1.0)           0;0  0.1653 (0.95)          1           1
test_perf_sign_psbt_singlesig_tr[1]                6.3956 (1.11)       6.3956 (1.11)       6.3956 (1.11)     0.0000 (1.0)        6.3956 (1.11)     0.0000 (1.0)           0;0  0.1564 (0.90)          1           1
test_perf_sign_psbt_multisig2of3_wsh[1]           10.1345 (1.76)      10.1345 (1.76)      10.1345 (1.76)     0.0000 (1.0)       10.1345 (1.76)     0.0000 (1.0)           0;0  0.0987 (0.57)          1           1
test_perf_sign_psbt_singlesig_wpkh[3]             11.4041 (1.98)      11.4041 (1.98)      11.4041 (1.98)     0.0000 (1.0)       11.4041 (1.98)     0.0000 (1.0)           0;0  0.0877 (0.51)          1           1
test_perf_sign_psbt_singlesig_tr[3]               12.5451 (2.17)      12.5451 (2.17)      12.5451 (2.17)     0.0000 (1.0)       12.5451 (2.17)     0.0000 (1.0)           0;0  0.0797 (0.46)          1           1
test_perf_sign_psbt_singlesig_pkh[3]              13.0201 (2.26)      13.0201 (2.26)      13.0201 (2.26)     0.0000 (1.0)       13.0201 (2.26)     0.0000 (1.0)           0;0  0.0768 (0.44)          1           1
test_perf_sign_psbt_tapminiscript_2paths[1]       14.7126 (2.55)      14.7126 (2.55)      14.7126 (2.55)     0.0000 (1.0)       14.7126 (2.55)     0.0000 (1.0)           0;0  0.0680 (0.39)          1           1
test_perf_sign_psbt_multisig3of5_wsh[1]           18.4035 (3.19)      18.4035 (3.19)      18.4035 (3.19)     0.0000 (1.0)       18.4035 (3.19)     0.0000 (1.0)           0;0  0.0543 (0.31)          1           1
test_perf_sign_psbt_multisig2of3_wsh[3]           20.4315 (3.54)      20.4315 (3.54)      20.4315 (3.54)     0.0000 (1.0)       20.4315 (3.54)     0.0000 (1.0)           0;0  0.0489 (0.28)          1           1
test_perf_sign_psbt_singlesig_wpkh[10]            30.6784 (5.32)      30.6784 (5.32)      30.6784 (5.32)     0.0000 (1.0)       30.6784 (5.32)     0.0000 (1.0)           0;0  0.0326 (0.19)          1           1
test_perf_sign_psbt_tapminiscript_2paths[3]       31.7105 (5.50)      31.7105 (5.50)      31.7105 (5.50)     0.0000 (1.0)       31.7105 (5.50)     0.0000 (1.0)           0;0  0.0315 (0.18)          1           1
test_perf_sign_psbt_singlesig_tr[10]              33.9459 (5.88)      33.9459 (5.88)      33.9459 (5.88)     0.0000 (1.0)       33.9459 (5.88)     0.0000 (1.0)           0;0  0.0295 (0.17)          1           1
test_perf_sign_psbt_multisig3of5_wsh[3]           36.9728 (6.41)      36.9728 (6.41)      36.9728 (6.41)     0.0000 (1.0)       36.9728 (6.41)     0.0000 (1.0)           0;0  0.0270 (0.16)          1           1
test_perf_sign_psbt_singlesig_pkh[10]             47.7810 (8.28)      47.7810 (8.28)      47.7810 (8.28)     0.0000 (1.0)       47.7810 (8.28)     0.0000 (1.0)           0;0  0.0209 (0.12)          1           1
test_perf_sign_psbt_multisig2of3_wsh[10]          56.5258 (9.80)      56.5258 (9.80)      56.5258 (9.80)     0.0000 (1.0)       56.5258 (9.80)     0.0000 (1.0)           0;0  0.0177 (0.10)          1           1
test_perf_sign_psbt_tapminiscript_2paths[10]      88.7755 (15.39)     88.7755 (15.39)     88.7755 (15.39)    0.0000 (1.0)       88.7755 (15.39)    0.0000 (1.0)           0;0  0.0113 (0.06)          1           1
test_perf_sign_psbt_multisig3of5_wsh[10]         101.9806 (17.68)    101.9806 (17.68)    101.9806 (17.68)    0.0000 (1.0)      101.9806 (17.68)    0.0000 (1.0)           0;0  0.0098 (0.06)          1           1

Using `secp256k1_point_unsafe`

test_perf_sign_psbt_singlesig_wpkh[1]             4.8987 (1.0)       4.8987 (1.0)       4.8987 (1.0)      0.0000 (1.0)       4.8987 (1.0)      0.0000 (1.0)           0;0  0.2041 (1.0)           1           1
test_perf_sign_psbt_singlesig_pkh[1]              5.1970 (1.06)      5.1970 (1.06)      5.1970 (1.06)     0.0000 (1.0)       5.1970 (1.06)     0.0000 (1.0)           0;0  0.1924 (0.94)          1           1
test_perf_sign_psbt_singlesig_tr[1]               5.2505 (1.07)      5.2505 (1.07)      5.2505 (1.07)     0.0000 (1.0)       5.2505 (1.07)     0.0000 (1.0)           0;0  0.1905 (0.93)          1           1
test_perf_sign_psbt_multisig2of3_wsh[1]           7.3487 (1.50)      7.3487 (1.50)      7.3487 (1.50)     0.0000 (1.0)       7.3487 (1.50)     0.0000 (1.0)           0;0  0.1361 (0.67)          1           1
test_perf_sign_psbt_singlesig_wpkh[3]             9.4191 (1.92)      9.4191 (1.92)      9.4191 (1.92)     0.0000 (1.0)       9.4191 (1.92)     0.0000 (1.0)           0;0  0.1062 (0.52)          1           1
test_perf_sign_psbt_singlesig_tr[3]              10.1040 (2.06)     10.1040 (2.06)     10.1040 (2.06)     0.0000 (1.0)      10.1040 (2.06)     0.0000 (1.0)           0;0  0.0990 (0.48)          1           1
test_perf_sign_psbt_tapminiscript_2paths[1]      10.7560 (2.20)     10.7560 (2.20)     10.7560 (2.20)     0.0000 (1.0)      10.7560 (2.20)     0.0000 (1.0)           0;0  0.0930 (0.46)          1           1
test_perf_sign_psbt_singlesig_pkh[3]             11.0288 (2.25)     11.0288 (2.25)     11.0288 (2.25)     0.0000 (1.0)      11.0288 (2.25)     0.0000 (1.0)           0;0  0.0907 (0.44)          1           1
test_perf_sign_psbt_multisig3of5_wsh[1]          12.3571 (2.52)     12.3571 (2.52)     12.3571 (2.52)     0.0000 (1.0)      12.3571 (2.52)     0.0000 (1.0)           0;0  0.0809 (0.40)          1           1
test_perf_sign_psbt_multisig2of3_wsh[3]          14.6858 (3.00)     14.6858 (3.00)     14.6858 (3.00)     0.0000 (1.0)      14.6858 (3.00)     0.0000 (1.0)           0;0  0.0681 (0.33)          1           1
test_perf_sign_psbt_tapminiscript_2paths[3]      22.8977 (4.67)     22.8977 (4.67)     22.8977 (4.67)     0.0000 (1.0)      22.8977 (4.67)     0.0000 (1.0)           0;0  0.0437 (0.21)          1           1
test_perf_sign_psbt_multisig3of5_wsh[3]          24.7458 (5.05)     24.7458 (5.05)     24.7458 (5.05)     0.0000 (1.0)      24.7458 (5.05)     0.0000 (1.0)           0;0  0.0404 (0.20)          1           1
test_perf_sign_psbt_singlesig_wpkh[10]           24.8556 (5.07)     24.8556 (5.07)     24.8556 (5.07)     0.0000 (1.0)      24.8556 (5.07)     0.0000 (1.0)           0;0  0.0402 (0.20)          1           1
test_perf_sign_psbt_singlesig_tr[10]             27.0827 (5.53)     27.0827 (5.53)     27.0827 (5.53)     0.0000 (1.0)      27.0827 (5.53)     0.0000 (1.0)           0;0  0.0369 (0.18)          1           1
test_perf_sign_psbt_multisig2of3_wsh[10]         40.7628 (8.32)     40.7628 (8.32)     40.7628 (8.32)     0.0000 (1.0)      40.7628 (8.32)     0.0000 (1.0)           0;0  0.0245 (0.12)          1           1
test_perf_sign_psbt_singlesig_pkh[10]            42.1003 (8.59)     42.1003 (8.59)     42.1003 (8.59)     0.0000 (1.0)      42.1003 (8.59)     0.0000 (1.0)           0;0  0.0238 (0.12)          1           1
test_perf_sign_psbt_tapminiscript_2paths[10]     64.3309 (13.13)    64.3309 (13.13)    64.3309 (13.13)    0.0000 (1.0)      64.3309 (13.13)    0.0000 (1.0)           0;0  0.0155 (0.08)          1           1
test_perf_sign_psbt_multisig3of5_wsh[10]         67.7685 (13.83)    67.7685 (13.83)    67.7685 (13.83)    0.0000 (1.0)      67.7685 (13.83)    0.0000 (1.0)           0;0  0.0148 (0.07)          1           1

Adding the cache for repeated derivations

Negligible impact for small transactions, but very substantial for transactions with many inputs.

test_perf_sign_psbt_singlesig_wpkh[1]             4.7133 (1.0)       4.7133 (1.0)       4.7133 (1.0)      0.0000 (1.0)       4.7133 (1.0)      0.0000 (1.0)           0;0  0.2122 (1.0)           1           1
test_perf_sign_psbt_singlesig_pkh[1]              4.8031 (1.02)      4.8031 (1.02)      4.8031 (1.02)     0.0000 (1.0)       4.8031 (1.02)     0.0000 (1.0)           0;0  0.2082 (0.98)          1           1
test_perf_sign_psbt_singlesig_tr[1]               4.9164 (1.04)      4.9164 (1.04)      4.9164 (1.04)     0.0000 (1.0)       4.9164 (1.04)     0.0000 (1.0)           0;0  0.2034 (0.96)          1           1
test_perf_sign_psbt_multisig2of3_wsh[1]           5.9457 (1.26)      5.9457 (1.26)      5.9457 (1.26)     0.0000 (1.0)       5.9457 (1.26)     0.0000 (1.0)           0;0  0.1682 (0.79)          1           1
test_perf_sign_psbt_singlesig_wpkh[3]             8.5114 (1.81)      8.5114 (1.81)      8.5114 (1.81)     0.0000 (1.0)       8.5114 (1.81)     0.0000 (1.0)           0;0  0.1175 (0.55)          1           1
test_perf_sign_psbt_tapminiscript_2paths[1]       8.9499 (1.90)      8.9499 (1.90)      8.9499 (1.90)     0.0000 (1.0)       8.9499 (1.90)     0.0000 (1.0)           0;0  0.1117 (0.53)          1           1
test_perf_sign_psbt_multisig3of5_wsh[1]           8.9516 (1.90)      8.9516 (1.90)      8.9516 (1.90)     0.0000 (1.0)       8.9516 (1.90)     0.0000 (1.0)           0;0  0.1117 (0.53)          1           1
test_perf_sign_psbt_singlesig_tr[3]               9.3474 (1.98)      9.3474 (1.98)      9.3474 (1.98)     0.0000 (1.0)       9.3474 (1.98)     0.0000 (1.0)           0;0  0.1070 (0.50)          1           1
test_perf_sign_psbt_singlesig_pkh[3]             10.3556 (2.20)     10.3556 (2.20)     10.3556 (2.20)     0.0000 (1.0)      10.3556 (2.20)     0.0000 (1.0)           0;0  0.0966 (0.46)          1           1
test_perf_sign_psbt_multisig2of3_wsh[3]          11.9158 (2.53)     11.9158 (2.53)     11.9158 (2.53)     0.0000 (1.0)      11.9158 (2.53)     0.0000 (1.0)           0;0  0.0839 (0.40)          1           1
test_perf_sign_psbt_multisig3of5_wsh[3]          17.2955 (3.67)     17.2955 (3.67)     17.2955 (3.67)     0.0000 (1.0)      17.2955 (3.67)     0.0000 (1.0)           0;0  0.0578 (0.27)          1           1
test_perf_sign_psbt_tapminiscript_2paths[3]      18.9440 (4.02)     18.9440 (4.02)     18.9440 (4.02)     0.0000 (1.0)      18.9440 (4.02)     0.0000 (1.0)           0;0  0.0528 (0.25)          1           1
test_perf_sign_psbt_singlesig_wpkh[10]           22.4683 (4.77)     22.4683 (4.77)     22.4683 (4.77)     0.0000 (1.0)      22.4683 (4.77)     0.0000 (1.0)           0;0  0.0445 (0.21)          1           1
test_perf_sign_psbt_singlesig_tr[10]             24.7001 (5.24)     24.7001 (5.24)     24.7001 (5.24)     0.0000 (1.0)      24.7001 (5.24)     0.0000 (1.0)           0;0  0.0405 (0.19)          1           1
test_perf_sign_psbt_multisig2of3_wsh[10]         31.8933 (6.77)     31.8933 (6.77)     31.8933 (6.77)     0.0000 (1.0)      31.8933 (6.77)     0.0000 (1.0)           0;0  0.0314 (0.15)          1           1
test_perf_sign_psbt_singlesig_pkh[10]            39.4850 (8.38)     39.4850 (8.38)     39.4850 (8.38)     0.0000 (1.0)      39.4850 (8.38)     0.0000 (1.0)           0;0  0.0253 (0.12)          1           1
test_perf_sign_psbt_multisig3of5_wsh[10]         47.0507 (9.98)     47.0507 (9.98)     47.0507 (9.98)     0.0000 (1.0)      47.0507 (9.98)     0.0000 (1.0)           0;0  0.0213 (0.10)          1           1
test_perf_sign_psbt_tapminiscript_2paths[10]     50.0241 (10.61)    50.0241 (10.61)    50.0241 (10.61)    0.0000 (1.0)      50.0241 (10.61)    0.0000 (1.0)           0;0  0.0200 (0.09)          1           1

codecov-commenter commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 84.76%. Comparing base (5499c3e) to head (5f6afe9). Report is 4 commits behind head on develop.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## develop #276 +/- ## =========================================== + Coverage 84.73% 84.76% +0.03% =========================================== Files 17 17 Lines 2181 2186 +5 =========================================== + Hits 1848 1853 +5 Misses 333 333 ``` | [Flag](https://app.codecov.io/gh/LedgerHQ/app-bitcoin-new/pull/276/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=LedgerHQ) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/LedgerHQ/app-bitcoin-new/pull/276/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=LedgerHQ) | `84.76% <100.00%> (+0.03%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=LedgerHQ#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.