cockroachdb / apd

Arbitrary-precision decimals for Go
https://pkg.go.dev/github.com/cockroachdb/apd/v2
Apache License 2.0
664 stars 36 forks source link

[DNM] apd: embed small coefficient values in Decimal struct #101

Closed nvanbenschoten closed 2 years ago

nvanbenschoten commented 2 years ago

Replaces https://github.com/cockroachdb/cockroach/pull/74369.

This commit introduces a performance optimization that embeds small coefficient values directly in their Decimal struct, instead of storing these values in a separate heap allocation.

Each Decimal maintains (through big.Int) an internal reference to a variable-length coefficient, which is represented by a []big.Word. The coeffInline field and the lazyInit method combine to allow us to inline this coefficient within the Decimal struct when its value is sufficiently small. In lazyInit, we point the Coeff field's backing slice at the coeffInline array. big.Int will avoid re-allocating this array until it is provided with a value that exceeds the initial capacity. We set this initial capacity to accommodate any coefficient that would fit in a 64-bit integer (i.e. values up to 2^64).

This is an alternative to an optimization that many other arbitrary precision decimal libraries have where small coefficient values are stored as numeric fields in their data type's struct. Only when this coefficient value gets large enough do these libraries fall back to a variable-length coefficient with internal indirection. We can see the optimization in practice in the ericlagergren/decimal library, where each struct contains a compact uint64 and an unscaled big.Int. Prior concern from the authors of cockroachdb/apd regarding this form of compact representation optimization was that it traded performance for complexity. The optimization fractured control flow, leaking out across the library and leading to more complex, error-prone code.

The approach taken in this commit does not have the same issue. All arithmetic on the decimal's coefficient is still deferred to bit.Int. In fact, the entire optimization is best-effort, and bugs that lead to missed calls to lazyInit are merely missed opportunities to avoid a heap allocation, and nothing more serious. The internal reference within Decimal is vulnerable to memory aliasing bugs when a Decimal is copied by value (and not through the Decimal.Set method), but this is not a new concern.

Impact on benchmarks:

name                   old time/op    new time/op    delta
GDA/compare-10           61.5µs ±20%    54.5µs ±18%  -11.34%  (p=0.035 n=10+10)
GDA/reduce-10            18.8µs ± 7%    17.5µs ± 6%   -7.16%  (p=0.001 n=10+10)
GDA/remainder-10         67.0µs ±11%    62.2µs ± 6%   -7.09%  (p=0.023 n=10+10)
Exp/P5/S-4/D-2-10        2.56µs ± 4%    2.51µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-4/D2-10         2.65µs ± 4%    2.60µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-1/D-2-10        6.44µs ± 5%    6.31µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-1/D2-10         6.27µs ± 4%    6.15µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P5/S2/D-2-10         19.3µs ± 4%    19.1µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P5/S2/D2-10          19.1µs ± 4%    18.9µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P10/S-4/D-10-10      10.5µs ± 4%    10.4µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P10/S-4/D-2-10       5.59µs ± 4%    5.49µs ± 5%     ~     (p=0.138 n=10+10)
Exp/P10/S-4/D2-10        4.68µs ± 4%    4.62µs ± 5%     ~     (p=0.138 n=10+10)
Exp/P10/S-4/D10-10       10.5µs ± 4%    10.4µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D-10-10      22.2µs ± 4%    22.1µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D-2-10       12.2µs ± 4%    12.1µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D2-10        12.2µs ± 4%    12.0µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D10-10       21.8µs ± 4%    21.7µs ± 4%     ~     (p=0.247 n=10+10)
Exp/P10/S2/D-10-10       44.0µs ± 4%    44.0µs ± 4%     ~     (p=0.165 n=10+10)
Exp/P10/S2/D-2-10        31.8µs ± 4%    31.7µs ± 4%     ~     (p=0.190 n=10+10)
Exp/P10/S2/D2-10         28.7µs ± 4%    28.6µs ± 4%     ~     (p=0.139 n=10+10)
Exp/P10/S2/D10-10        44.0µs ± 4%    44.0µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P100/S-4/D-100-10     561µs ± 4%     565µs ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S-4/D-10-10      297µs ± 4%     299µs ± 4%     ~     (p=0.912 n=10+10)
Exp/P100/S-4/D-2-10       286µs ± 4%     289µs ± 4%     ~     (p=0.315 n=10+10)
Exp/P100/S-4/D2-10        287µs ± 4%     289µs ± 4%     ~     (p=0.684 n=10+10)
Exp/P100/S-4/D10-10       292µs ± 4%     294µs ± 4%     ~     (p=0.529 n=10+10)
Exp/P100/S-4/D100-10      550µs ± 4%     554µs ± 4%     ~     (p=0.739 n=10+10)
Exp/P100/S-1/D-100-10    1.05ms ± 4%    1.05ms ± 4%     ~     (p=0.631 n=10+10)
Exp/P100/S-1/D-10-10      637µs ± 4%     641µs ± 4%     ~     (p=0.739 n=10+10)
Exp/P100/S-1/D-2-10       636µs ± 4%     641µs ± 4%     ~     (p=0.436 n=10+10)
Exp/P100/S-1/D2-10        612µs ± 4%     616µs ± 4%     ~     (p=0.631 n=10+10)
Exp/P100/S-1/D10-10       621µs ± 4%     625µs ± 4%     ~     (p=0.912 n=10+10)
Exp/P100/S-1/D100-10     1.07ms ± 3%    1.08ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D-100-10     1.54ms ± 3%    1.55ms ± 4%     ~     (p=0.684 n=10+10)
Exp/P100/S2/D-10-10      1.06ms ± 4%    1.06ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D-2-10        918µs ± 4%     924µs ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D2-10        1.01ms ± 4%    1.02ms ± 4%     ~     (p=0.853 n=10+10)
Exp/P100/S2/D10-10       1.05ms ± 4%    1.06ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D100-10      1.58ms ± 4%    1.59ms ± 4%     ~     (p=0.853 n=10+10)
Ln/P2/S-100/D2-10        25.1µs ± 4%    25.0µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S-10/D2-10         22.6µs ± 4%    22.6µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S-2/D2-10          24.9µs ± 4%    24.8µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S2/D2-10           28.1µs ± 4%    27.9µs ± 4%     ~     (p=0.139 n=10+10)
Ln/P2/S10/D2-10          27.0µs ± 4%    26.9µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S100/D2-10         28.7µs ± 4%    28.5µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S-100/D2-10       78.1µs ± 4%    78.1µs ± 4%     ~     (p=0.218 n=10+10)
Ln/P10/S-100/D10-10      74.3µs ± 4%    74.3µs ± 4%     ~     (p=0.190 n=10+10)
Ln/P10/S-10/D2-10        83.4µs ± 4%    83.4µs ± 4%     ~     (p=0.315 n=10+10)
Ln/P10/S-10/D10-10       80.9µs ± 4%    80.9µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S-2/D2-10         82.1µs ± 4%    82.2µs ± 5%     ~     (p=0.280 n=10+10)
Ln/P10/S-2/D10-10        90.6µs ± 4%    90.6µs ± 5%     ~     (p=0.190 n=10+10)
Ln/P10/S2/D2-10          80.6µs ± 4%    80.6µs ± 4%     ~     (p=0.184 n=10+10)
Ln/P10/S2/D10-10         79.7µs ± 4%    79.5µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S10/D2-10         85.2µs ± 4%    85.2µs ± 5%     ~     (p=0.218 n=10+10)
Ln/P10/S10/D10-10        74.8µs ± 4%    74.7µs ± 4%     ~     (p=0.247 n=10+10)
Ln/P10/S100/D2-10        83.8µs ± 4%    83.6µs ± 4%     ~     (p=0.165 n=10+10)
Ln/P10/S100/D10-10       66.1µs ± 4%    65.9µs ± 5%     ~     (p=0.143 n=10+10)
Ln/P100/S-100/D2-10      3.87ms ± 4%    3.90ms ± 5%     ~     (p=0.853 n=10+10)
Ln/P100/S-100/D10-10     3.34ms ± 4%    3.35ms ± 5%     ~     (p=0.529 n=10+10)
Ln/P100/S-100/D100-10    3.46ms ± 4%    3.48ms ± 5%     ~     (p=0.971 n=10+10)
Ln/P100/S-10/D2-10       3.63ms ± 4%    3.65ms ± 4%     ~     (p=0.739 n=10+10)
Ln/P100/S-10/D10-10      3.43ms ± 4%    3.44ms ± 4%     ~     (p=0.796 n=10+10)
Ln/P100/S-10/D100-10     3.42ms ± 4%    3.45ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S-2/D2-10        3.51ms ± 4%    3.53ms ± 4%     ~     (p=0.853 n=10+10)
Ln/P100/S-2/D10-10       3.21ms ± 4%    3.24ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S-2/D100-10      3.47ms ± 4%    3.50ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S2/D2-10         3.56ms ± 4%    3.59ms ± 4%     ~     (p=0.796 n=10+10)
Ln/P100/S2/D10-10        3.82ms ± 4%    3.84ms ± 4%     ~     (p=1.000 n=10+10)
Ln/P100/S2/D100-10       3.72ms ± 4%    3.75ms ± 5%     ~     (p=0.353 n=10+10)
Ln/P100/S10/D2-10        3.78ms ± 4%    3.81ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S10/D10-10       3.50ms ± 4%    3.52ms ± 4%     ~     (p=0.529 n=10+10)
Ln/P100/S10/D100-10      3.49ms ± 4%    3.52ms ± 4%     ~     (p=0.684 n=10+10)
Ln/P100/S100/D2-10       3.82ms ± 4%    3.85ms ± 4%     ~     (p=0.393 n=10+10)
Ln/P100/S100/D10-10      3.72ms ± 4%    3.75ms ± 4%     ~     (p=0.393 n=10+10)
Ln/P100/S100/D100-10     3.65ms ± 4%    3.68ms ± 4%     ~     (p=0.315 n=10+10)
GDA/abs-10               10.4µs ± 6%    10.3µs ± 6%     ~     (p=0.564 n=10+10)
GDA/base-10               110µs ± 4%     111µs ± 5%     ~     (p=0.869 n=10+10)
GDA/comparetotal-10      30.2µs ± 5%    30.3µs ± 5%     ~     (p=0.631 n=10+10)
GDA/divide-10             391µs ± 7%     390µs ± 3%     ~     (p=0.853 n=10+10)
GDA/exp-10                127ms ± 4%     127ms ± 4%     ~     (p=0.796 n=10+10)
GDA/ln-10                83.6ms ± 3%    83.6ms ± 4%     ~     (p=0.218 n=10+10)
GDA/log10-10              106ms ± 4%     105ms ± 4%     ~     (p=0.218 n=10+10)
GDA/minus-10             11.7µs ± 4%    11.7µs ± 7%     ~     (p=0.579 n=10+10)
GDA/multiply-10          77.3µs ± 9%    80.8µs ± 9%     ~     (p=0.089 n=10+10)
GDA/plus-10              42.9µs ± 5%    44.2µs ± 4%     ~     (p=0.052 n=10+10)
GDA/power-10              218ms ± 3%     218ms ± 3%     ~     (p=0.631 n=10+10)
GDA/powersqrt-10          450ms ± 2%     448ms ± 2%     ~     (p=0.280 n=10+10)
GDA/quantize-10           143µs ±12%     128µs ± 8%     ~     (p=0.052 n=10+10)
GDA/randoms-10           3.05ms ± 5%    3.02ms ± 5%     ~     (p=0.143 n=10+10)
GDA/rounding-10           659µs ± 4%     658µs ± 4%     ~     (p=0.579 n=10+10)
GDA/squareroot-10        33.1ms ± 3%    32.7ms ± 3%     ~     (p=0.143 n=10+10)
GDA/subtract-10           172µs ±11%     170µs ± 8%     ~     (p=0.912 n=10+10)
GDA/tointegral-10        32.9µs ± 6%    32.2µs ± 5%     ~     (p=0.190 n=10+10)
GDA/tointegralx-10       33.6µs ± 6%    32.7µs ± 4%     ~     (p=0.190 n=10+10)
GDA/cuberoot-apd-10      2.08ms ± 3%    2.08ms ± 4%     ~     (p=0.165 n=10+10)
GDA/add-10                904µs ± 5%     919µs ± 5%   +1.65%  (p=0.043 n=10+10)
GDA/divideint-10         28.7µs ± 4%    29.7µs ± 1%   +3.51%  (p=0.026 n=9+6)

name                   old alloc/op   new alloc/op   delta
GDA/compare-10           10.1kB ± 0%     7.0kB ± 0%  -30.72%  (p=0.000 n=10+10)
GDA/quantize-10          42.3kB ± 0%    36.5kB ± 0%  -13.63%  (p=0.000 n=9+10)
GDA/divideint-10         6.51kB ± 0%    5.70kB ± 0%  -12.41%  (p=0.000 n=10+10)
GDA/remainder-10         22.4kB ± 0%    20.0kB ± 0%  -10.60%  (p=0.000 n=10+10)
GDA/tointegralx-10       19.2kB ± 0%    17.4kB ± 0%   -9.28%  (p=0.000 n=10+10)
GDA/tointegral-10        19.1kB ± 0%    17.4kB ± 0%   -8.83%  (p=0.000 n=10+10)
GDA/reduce-10            2.22kB ± 0%    2.02kB ± 0%   -8.66%  (p=0.000 n=10+10)
GDA/divide-10            74.3kB ± 0%    70.1kB ± 0%   -5.62%  (p=0.000 n=10+10)
GDA/squareroot-10        8.30MB ± 0%    7.96MB ± 0%   -4.08%  (p=0.000 n=10+10)
Exp/P5/S-4/D-2-10        1.23kB ± 0%    1.18kB ± 0%   -3.92%  (p=0.000 n=10+10)
Exp/P10/S-1/D-2-10       4.50kB ± 0%    4.33kB ± 0%   -3.75%  (p=0.000 n=10+10)
Exp/P5/S-1/D-2-10        2.83kB ± 0%    2.72kB ± 0%   -3.75%  (p=0.000 n=10+10)
Exp/P5/S-4/D2-10         1.29kB ± 0%    1.24kB ± 0%   -3.72%  (p=0.000 n=10+10)
Exp/P10/S-4/D2-10        2.02kB ± 0%    1.94kB ± 0%   -3.71%  (p=0.000 n=10+10)
Exp/P10/S2/D-2-10        9.59kB ± 0%    9.24kB ± 0%   -3.66%  (p=0.000 n=10+10)
Exp/P5/S2/D-2-10         7.53kB ± 0%    7.26kB ± 0%   -3.65%  (p=0.000 n=10+10)
Exp/P5/S2/D2-10          7.76kB ± 0%    7.48kB ± 0%   -3.65%  (p=0.000 n=10+10)
Exp/P10/S2/D2-10         9.75kB ± 0%    9.40kB ± 0%   -3.60%  (p=0.000 n=10+10)
Exp/P10/S-4/D-2-10       2.32kB ± 0%    2.24kB ± 0%   -3.58%  (p=0.000 n=10+10)
GDA/rounding-10           246kB ± 0%     237kB ± 0%   -3.56%  (p=0.000 n=10+10)
GDA/randoms-10           1.16MB ± 0%    1.12MB ± 0%   -3.55%  (p=0.000 n=10+10)
Exp/P10/S2/D-10-10       9.79kB ± 0%    9.44kB ± 0%   -3.54%  (p=0.000 n=10+10)
Exp/P10/S2/D10-10        10.7kB ± 0%    10.3kB ± 0%   -3.53%  (p=0.000 n=10+10)
Exp/P5/S-1/D2-10         2.87kB ± 0%    2.77kB ± 0%   -3.52%  (p=0.000 n=10+10)
Exp/P10/S-1/D-10-10      4.95kB ± 0%    4.78kB ± 0%   -3.47%  (p=0.000 n=10+10)
Exp/P10/S-1/D2-10        4.64kB ± 0%    4.49kB ± 0%   -3.40%  (p=0.000 n=10+10)
GDA/cuberoot-apd-10       261kB ± 0%     252kB ± 0%   -3.36%  (p=0.000 n=10+10)
Exp/P10/S-1/D10-10       5.17kB ± 0%    5.00kB ± 0%   -3.33%  (p=0.000 n=10+10)
GDA/powersqrt-10         77.6MB ± 0%    75.1MB ± 0%   -3.29%  (p=0.000 n=10+10)
Ln/P10/S-2/D2-10         17.1kB ± 0%    16.5kB ± 0%   -3.26%  (p=0.000 n=9+10)
Ln/P10/S-2/D10-10        18.6kB ± 0%    18.0kB ± 0%   -3.24%  (p=0.000 n=9+8)
Exp/P10/S-4/D-10-10      2.56kB ± 0%    2.48kB ± 0%   -3.24%  (p=0.000 n=10+10)
Ln/P10/S2/D2-10          17.0kB ± 0%    16.5kB ± 0%   -3.23%  (p=0.000 n=7+10)
Ln/P10/S10/D10-10        15.8kB ± 0%    15.3kB ± 0%   -3.23%  (p=0.000 n=10+10)
Ln/P10/S10/D2-10         18.0kB ± 0%    17.4kB ± 0%   -3.22%  (p=0.000 n=10+10)
Ln/P10/S-10/D2-10        17.0kB ± 0%    16.4kB ± 0%   -3.22%  (p=0.000 n=8+10)
Ln/P10/S2/D10-10         16.5kB ± 0%    16.0kB ± 0%   -3.21%  (p=0.000 n=10+10)
Ln/P10/S-10/D10-10       17.1kB ± 0%    16.5kB ± 0%   -3.18%  (p=0.000 n=10+7)
Exp/P10/S-4/D10-10       2.61kB ± 0%    2.52kB ± 0%   -3.18%  (p=0.000 n=10+10)
Ln/P2/S10/D2-10          10.3kB ± 0%     9.9kB ± 0%   -3.15%  (p=0.000 n=10+10)
Ln/P10/S-100/D2-10       17.1kB ± 0%    16.6kB ± 0%   -3.11%  (p=0.000 n=10+10)
Ln/P10/S100/D2-10        17.8kB ± 0%    17.2kB ± 0%   -3.11%  (p=0.000 n=8+10)
Ln/P2/S2/D2-10           10.3kB ± 0%     9.9kB ± 0%   -3.10%  (p=0.000 n=10+9)
Ln/P2/S-2/D2-10          9.62kB ± 0%    9.32kB ± 0%   -3.09%  (p=0.000 n=10+10)
Ln/P10/S100/D10-10       15.0kB ± 0%    14.5kB ± 0%   -3.09%  (p=0.000 n=9+10)
Ln/P10/S-100/D10-10      16.3kB ± 0%    15.8kB ± 0%   -3.08%  (p=0.000 n=10+10)
Ln/P2/S-10/D2-10         8.91kB ± 0%    8.64kB ± 0%   -3.07%  (p=0.000 n=10+9)
GDA/subtract-10           100kB ± 0%      97kB ± 0%   -2.89%  (p=0.000 n=10+10)
Ln/P2/S100/D2-10         10.9kB ± 0%    10.6kB ± 0%   -2.86%  (p=0.000 n=10+10)
Ln/P2/S-100/D2-10        9.81kB ± 0%    9.53kB ± 0%   -2.83%  (p=0.000 n=10+10)
GDA/ln-10                10.2MB ± 0%    10.0MB ± 0%   -2.09%  (p=0.000 n=10+10)
GDA/log10-10             12.5MB ± 0%    12.3MB ± 0%   -1.87%  (p=0.000 n=10+10)
GDA/power-10             27.0MB ± 0%    26.5MB ± 0%   -1.78%  (p=0.000 n=10+10)
Exp/P100/S-4/D10-10      31.9kB ± 0%    31.4kB ± 0%   -1.46%  (p=0.000 n=10+10)
Exp/P100/S-4/D-10-10     33.5kB ± 0%    33.0kB ± 0%   -1.41%  (p=0.000 n=10+10)
Exp/P100/S-4/D2-10       31.8kB ± 0%    31.4kB ± 0%   -1.40%  (p=0.000 n=10+9)
Exp/P100/S-4/D-2-10      34.6kB ± 0%    34.1kB ± 0%   -1.36%  (p=0.000 n=10+9)
Exp/P100/S-1/D10-10      65.0kB ± 0%    64.2kB ± 0%   -1.27%  (p=0.000 n=8+10)
Exp/P100/S-1/D2-10       64.9kB ± 0%    64.1kB ± 0%   -1.24%  (p=0.000 n=10+10)
Exp/P100/S-1/D-10-10     70.1kB ± 0%    69.3kB ± 0%   -1.19%  (p=0.000 n=10+10)
Exp/P100/S-1/D-2-10      72.6kB ± 0%    71.8kB ± 0%   -1.14%  (p=0.000 n=9+9)
Exp/P100/S2/D10-10        115kB ± 0%     114kB ± 0%   -1.05%  (p=0.000 n=10+10)
Exp/P100/S2/D2-10         113kB ± 0%     112kB ± 0%   -1.01%  (p=0.000 n=10+10)
Exp/P100/S2/D-2-10        107kB ± 0%     106kB ± 0%   -1.00%  (p=0.000 n=10+10)
Exp/P100/S2/D-10-10       117kB ± 0%     116kB ± 0%   -0.96%  (p=0.000 n=10+10)
Ln/P100/S-10/D10-10       318kB ± 0%     315kB ± 0%   -0.83%  (p=0.000 n=10+8)
Ln/P100/S-100/D100-10     328kB ± 0%     326kB ± 0%   -0.79%  (p=0.000 n=10+10)
Ln/P100/S-10/D2-10        336kB ± 0%     334kB ± 0%   -0.79%  (p=0.000 n=10+9)
Ln/P100/S-2/D2-10         326kB ± 0%     324kB ± 0%   -0.79%  (p=0.000 n=10+10)
Ln/P100/S100/D2-10        353kB ± 0%     350kB ± 0%   -0.79%  (p=0.000 n=10+9)
Ln/P100/S10/D10-10        326kB ± 0%     324kB ± 0%   -0.78%  (p=0.000 n=10+10)
Ln/P100/S100/D100-10      340kB ± 0%     338kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S2/D10-10         351kB ± 0%     348kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S-10/D100-10      323kB ± 0%     320kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S2/D100-10        344kB ± 0%     341kB ± 0%   -0.77%  (p=0.000 n=9+6)
Ln/P100/S-100/D10-10      311kB ± 0%     309kB ± 0%   -0.76%  (p=0.000 n=10+7)
Ln/P100/S10/D100-10       323kB ± 0%     321kB ± 0%   -0.76%  (p=0.000 n=8+9)
Ln/P100/S-100/D2-10       362kB ± 0%     360kB ± 0%   -0.75%  (p=0.000 n=10+10)
Ln/P100/S100/D10-10       345kB ± 0%     342kB ± 0%   -0.74%  (p=0.000 n=9+8)
Exp/P100/S-1/D100-10     94.0kB ± 0%    93.3kB ± 0%   -0.74%  (p=0.000 n=8+10)
Exp/P100/S-4/D100-10     47.9kB ± 0%    47.5kB ± 0%   -0.74%  (p=0.000 n=6+9)
Ln/P100/S2/D2-10          333kB ± 0%     330kB ± 0%   -0.74%  (p=0.000 n=10+10)
Exp/P100/S-1/D-100-10    97.5kB ± 0%    96.8kB ± 0%   -0.70%  (p=0.000 n=10+10)
Exp/P100/S-4/D-100-10    51.5kB ± 0%    51.2kB ± 0%   -0.70%  (p=0.000 n=10+10)
Ln/P100/S10/D2-10         346kB ± 0%     343kB ± 0%   -0.69%  (p=0.000 n=8+10)
Ln/P100/S-2/D10-10        306kB ± 0%     304kB ± 0%   -0.68%  (p=0.000 n=10+10)
Ln/P100/S-2/D100-10       329kB ± 0%     326kB ± 0%   -0.67%  (p=0.000 n=10+8)
Exp/P100/S2/D100-10       150kB ± 0%     149kB ± 0%   -0.67%  (p=0.000 n=10+10)
Exp/P100/S2/D-100-10      148kB ± 0%     147kB ± 0%   -0.62%  (p=0.000 n=9+10)
GDA/exp-10               61.3MB ± 0%    61.1MB ± 0%   -0.18%  (p=0.000 n=10+10)
GDA/base-10              24.4kB ± 0%    24.4kB ± 0%     ~     (all equal)
GDA/comparetotal-10      7.11kB ± 0%    7.11kB ± 0%     ~     (all equal)
GDA/add-10                712kB ± 0%     720kB ± 0%   +1.18%  (p=0.000 n=10+8)
GDA/plus-10              46.6kB ± 0%    47.5kB ± 0%   +2.04%  (p=0.000 n=10+9)
GDA/multiply-10          35.5kB ± 0%    39.2kB ± 0%  +10.32%  (p=0.000 n=10+10)
GDA/minus-10             2.43kB ± 0%    2.82kB ± 0%  +16.12%  (p=0.000 n=10+10)
GDA/abs-10               2.33kB ± 0%    2.82kB ± 0%  +21.31%  (p=0.000 n=10+10)

name                   old allocs/op  new allocs/op  delta
GDA/reduce-10               187 ± 0%        43 ± 0%  -77.01%  (p=0.000 n=10+10)
GDA/compare-10              638 ± 0%       250 ± 0%  -60.82%  (p=0.000 n=10+10)
GDA/minus-10                164 ± 0%        73 ± 0%  -55.49%  (p=0.000 n=10+10)
GDA/abs-10                  151 ± 0%        73 ± 0%  -51.66%  (p=0.000 n=10+10)
GDA/quantize-10           1.70k ± 0%     0.98k ± 0%  -42.30%  (p=0.000 n=10+10)
GDA/tointegralx-10          677 ± 0%       454 ± 0%  -32.94%  (p=0.000 n=10+10)
GDA/remainder-10            923 ± 0%       626 ± 0%  -32.18%  (p=0.000 n=10+10)
GDA/tointegral-10           665 ± 0%       454 ± 0%  -31.73%  (p=0.000 n=10+10)
GDA/divideint-10            357 ± 0%       256 ± 0%  -28.29%  (p=0.000 n=10+10)
GDA/subtract-10           2.85k ± 0%     2.14k ± 0%  -24.89%  (p=0.000 n=10+10)
GDA/divide-10             2.51k ± 0%     1.91k ± 0%  -24.01%  (p=0.000 n=10+10)
Exp/P5/S-4/D-2-10          47.0 ± 0%      37.0 ± 0%  -21.28%  (p=0.000 n=10+10)
Exp/P5/S-4/D2-10           50.0 ± 0%      40.0 ± 0%  -20.00%  (p=0.000 n=10+10)
Exp/P10/S-4/D2-10          76.0 ± 0%      62.0 ± 0%  -18.42%  (p=0.000 n=10+10)
GDA/rounding-10           9.20k ± 0%     7.59k ± 0%  -17.52%  (p=0.000 n=10+10)
Exp/P10/S-4/D-2-10         87.0 ± 0%      72.0 ± 0%  -17.24%  (p=0.000 n=10+10)
Exp/P10/S-4/D10-10         95.0 ± 0%      79.0 ± 0%  -16.84%  (p=0.000 n=10+10)
GDA/randoms-10            40.4k ± 0%     33.6k ± 0%  -16.82%  (p=0.000 n=10+10)
Exp/P5/S-1/D-2-10           104 ± 0%        87 ± 0%  -16.35%  (p=0.000 n=10+10)
Exp/P5/S-1/D2-10            105 ± 0%        88 ± 0%  -16.19%  (p=0.000 n=10+10)
Exp/P10/S-4/D-10-10        93.0 ± 0%      78.0 ± 0%  -16.13%  (p=0.000 n=10+10)
GDA/squareroot-10          275k ± 0%      230k ± 0%  -16.08%  (p=0.000 n=10+10)
Exp/P10/S-1/D-10-10         174 ± 0%       147 ± 0%  -15.52%  (p=0.000 n=10+10)
GDA/plus-10                 587 ± 0%       496 ± 0%  -15.50%  (p=0.000 n=10+10)
Exp/P10/S-1/D-2-10          163 ± 0%       138 ± 0%  -15.34%  (p=0.000 n=10+10)
Exp/P10/S-1/D2-10           166 ± 0%       141 ± 0%  -15.06%  (p=0.000 n=10+10)
Exp/P10/S-1/D10-10          182 ± 0%       155 ± 0%  -14.84%  (p=0.000 n=10+10)
Exp/P10/S2/D-10-10          335 ± 0%       286 ± 0%  -14.63%  (p=0.000 n=10+10)
Ln/P2/S-2/D2-10             342 ± 0%       292 ± 0%  -14.62%  (p=0.000 n=10+10)
Ln/P2/S2/D2-10              363 ± 0%       310 ± 0%  -14.60%  (p=0.000 n=10+10)
Ln/P2/S10/D2-10             366 ± 0%       313 ± 0%  -14.48%  (p=0.000 n=10+10)
Exp/P10/S2/D-2-10           335 ± 0%       287 ± 0%  -14.33%  (p=0.000 n=10+10)
GDA/cuberoot-apd-10       7.67k ± 0%     6.58k ± 0%  -14.27%  (p=0.000 n=10+10)
Ln/P10/S-2/D2-10            582 ± 0%       499 ± 0%  -14.26%  (p=0.000 n=10+10)
Ln/P10/S2/D2-10             582 ± 0%       499 ± 0%  -14.26%  (p=0.000 n=10+10)
Ln/P10/S10/D10-10           541 ± 0%       464 ± 0%  -14.23%  (p=0.000 n=10+10)
Ln/P10/S2/D10-10            563 ± 0%       483 ± 0%  -14.21%  (p=0.000 n=10+10)
Ln/P2/S-10/D2-10            317 ± 0%       272 ± 0%  -14.20%  (p=0.000 n=10+10)
GDA/powersqrt-10          2.63M ± 0%     2.26M ± 0%  -14.17%  (p=0.000 n=10+10)
Ln/P10/S-2/D10-10           637 ± 0%       547 ± 0%  -14.13%  (p=0.000 n=10+10)
Ln/P10/S-10/D2-10           581 ± 0%       499 ± 0%  -14.11%  (p=0.000 n=10+10)
Exp/P10/S2/D10-10           369 ± 0%       317 ± 0%  -14.09%  (p=0.000 n=10+10)
Exp/P10/S2/D2-10            342 ± 0%       294 ± 0%  -14.04%  (p=0.000 n=10+10)
Ln/P10/S10/D2-10            613 ± 0%       527 ± 0%  -14.03%  (p=0.000 n=9+10)
Ln/P10/S-10/D10-10          581 ± 0%       500 ± 0%  -13.99%  (p=0.000 n=10+10)
Exp/P5/S2/D-2-10            272 ± 0%       234 ± 0%  -13.97%  (p=0.000 n=10+10)
Ln/P10/S-100/D2-10          575 ± 0%       495 ± 0%  -13.91%  (p=0.000 n=10+10)
Ln/P10/S100/D2-10           597 ± 0%       514 ± 0%  -13.90%  (p=0.000 n=10+10)
Ln/P10/S100/D10-10          504 ± 0%       434 ± 0%  -13.89%  (p=0.000 n=10+10)
Exp/P5/S2/D2-10             281 ± 0%       242 ± 0%  -13.88%  (p=0.000 n=10+10)
Ln/P10/S-100/D10-10         549 ± 0%       473 ± 0%  -13.84%  (p=0.000 n=10+10)
Ln/P2/S-100/D2-10           336 ± 0%       290 ± 0%  -13.69%  (p=0.000 n=10+10)
Ln/P2/S100/D2-10            376 ± 0%       325 ± 0%  -13.56%  (p=0.000 n=10+10)
GDA/ln-10                  269k ± 0%      239k ± 0%  -10.88%  (p=0.000 n=10+10)
GDA/log10-10               326k ± 0%      292k ± 0%  -10.44%  (p=0.000 n=10+10)
GDA/power-10               699k ± 0%      627k ± 0%  -10.30%  (p=0.000 n=10+10)
GDA/add-10                13.2k ± 0%     11.9k ± 0%   -9.68%  (p=0.000 n=10+10)
Exp/P100/S-4/D10-10         701 ± 0%       638 ± 0%   -8.99%  (p=0.000 n=10+10)
Exp/P100/S-4/D-10-10        715 ± 0%       651 ± 0%   -8.95%  (p=0.000 n=10+10)
Exp/P100/S-4/D2-10          686 ± 0%       626 ± 0%   -8.75%  (p=0.000 n=10+10)
Exp/P100/S-4/D-2-10         723 ± 0%       661 ± 0%   -8.58%  (p=0.000 n=10+10)
Exp/P100/S-1/D10-10       1.36k ± 0%     1.26k ± 0%   -7.99%  (p=0.000 n=10+10)
Exp/P100/S-1/D2-10        1.35k ± 0%     1.24k ± 0%   -7.72%  (p=0.000 n=9+10)
Exp/P100/S-1/D-10-10      1.40k ± 0%     1.29k ± 0%   -7.70%  (p=0.000 n=10+10)
Exp/P100/S-1/D-2-10       1.43k ± 0%     1.32k ± 0%   -7.48%  (p=0.000 n=9+10)
Exp/P100/S2/D-2-10        2.03k ± 0%     1.89k ± 0%   -6.82%  (p=0.000 n=9+10)
Exp/P100/S2/D10-10        2.28k ± 0%     2.13k ± 0%   -6.81%  (p=0.000 n=10+9)
Exp/P100/S2/D-10-10       2.19k ± 0%     2.05k ± 0%   -6.66%  (p=0.001 n=8+9)
Exp/P100/S2/D2-10         2.21k ± 0%     2.06k ± 0%   -6.61%  (p=0.000 n=9+10)
Ln/P100/S-10/D2-10        6.00k ± 0%     5.65k ± 0%   -5.87%  (p=0.000 n=10+9)
Ln/P100/S100/D2-10        6.25k ± 0%     5.89k ± 0%   -5.80%  (p=0.000 n=10+9)
Ln/P100/S-2/D2-10         5.79k ± 0%     5.46k ± 0%   -5.78%  (p=0.000 n=10+10)
Ln/P100/S100/D100-10      6.04k ± 0%     5.69k ± 0%   -5.76%  (p=0.000 n=10+10)
Ln/P100/S2/D10-10         6.21k ± 0%     5.86k ± 0%   -5.76%  (p=0.000 n=10+10)
Ln/P100/S2/D100-10        6.09k ± 0%     5.74k ± 0%   -5.73%  (p=0.000 n=10+8)
Exp/P100/S-4/D100-10        855 ± 0%       806 ± 0%   -5.73%  (p=0.000 n=10+10)
Ln/P100/S-10/D10-10       5.62k ± 0%     5.30k ± 0%   -5.72%  (p=0.000 n=10+8)
Ln/P100/S-100/D2-10       6.34k ± 0%     5.98k ± 0%   -5.68%  (p=0.000 n=10+10)
Ln/P100/S10/D100-10       5.73k ± 0%     5.41k ± 0%   -5.67%  (p=0.000 n=10+10)
Ln/P100/S100/D10-10       6.08k ± 0%     5.74k ± 0%   -5.66%  (p=0.000 n=9+8)
Ln/P100/S10/D2-10         6.13k ± 0%     5.78k ± 0%   -5.66%  (p=0.000 n=8+10)
Ln/P100/S-10/D100-10      5.69k ± 0%     5.37k ± 0%   -5.64%  (p=0.000 n=10+10)
Ln/P100/S10/D10-10        5.74k ± 0%     5.41k ± 0%   -5.64%  (p=0.000 n=10+10)
Ln/P100/S-100/D10-10      5.55k ± 0%     5.23k ± 0%   -5.63%  (p=0.000 n=10+7)
Ln/P100/S2/D2-10          5.86k ± 0%     5.53k ± 0%   -5.62%  (p=0.000 n=10+10)
Exp/P100/S-4/D-100-10       893 ± 0%       843 ± 0%   -5.60%  (p=0.000 n=10+10)
Ln/P100/S-100/D100-10     5.73k ± 0%     5.41k ± 0%   -5.54%  (p=0.000 n=10+10)
Exp/P100/S-1/D100-10      1.68k ± 0%     1.58k ± 0%   -5.52%  (p=0.000 n=8+10)
Ln/P100/S-2/D10-10        5.40k ± 0%     5.10k ± 0%   -5.49%  (p=0.000 n=10+10)
Ln/P100/S-2/D100-10       5.77k ± 0%     5.45k ± 0%   -5.49%  (p=0.000 n=10+10)
Exp/P100/S-1/D-100-10     1.68k ± 0%     1.59k ± 0%   -5.41%  (p=0.000 n=10+10)
Exp/P100/S2/D100-10       2.61k ± 0%     2.48k ± 0%   -5.03%  (p=0.000 n=8+10)
Exp/P100/S2/D-100-10      2.47k ± 0%     2.35k ± 0%   -4.90%  (p=0.000 n=10+10)
GDA/multiply-10             931 ± 0%       889 ± 0%   -4.51%  (p=0.000 n=10+10)
GDA/exp-10                 911k ± 0%      895k ± 0%   -1.75%  (p=0.000 n=10+10)
GDA/base-10               2.11k ± 0%     2.11k ± 0%     ~     (all equal)
GDA/comparetotal-10         254 ± 0%       254 ± 0%     ~     (all equal)

cc. @mjibson

nvanbenschoten commented 2 years ago

I think we can simplify this even further by creating a wrapper around big.Int and pushing the inline value optimization into that wrapper. Something along the lines of:

const inlineWords = 1 // or maybe 2

type BigInt struct {
    inner  big.Int
    inline [inlineWords]big.Word
}

func (b *BigInt) lazyInit() {
    if b.inner.Bits() == nil {
        b.inline = [inlineWords]big.Word{}
        b.inner.SetBits(b.inline[:0])
    }
}

func (b *BigInt) Add(x, y *BigInt) *BigInt {
    b.lazyInit()
    b.inner.Add(&x.inner, &y.inner)
    return b
}

...

type Decimal struct {
    Form     Form
    Negative bool
    Exponent int32
    Coeff    BigInt
}

We could then re-implement the big.Int API on this wrapper and add calls to (*BigInt).lazyInit where appropriate. That would probably be more lines of code, but it would mostly be boilerplate and would keep the optimization confined to a small layer below Decimal and above big.Int.

This BigInt wrapper may then be useful in other contexts. For instance, it would provide the inline optimization to intermediate results in this library, which are currently allocation heavy (see benchmark results above). It may also be useful in CRDB code.

nvanbenschoten commented 2 years ago

I'm closing this in favor of https://github.com/cockroachdb/apd/pull/102. It turns out that the self-referential pointer in this commit was leading to all stack-allocated Decimals escaping to the heap. In #102, we address this while also speeding up most intermediate computation in the library.