chromium / subspace

A concept-centered standard library for C++20, enabling safer and more reliable products and a more modern feel for C++ code.; Also home of Subdoc the code-documentation generator.
https://suslib.cc
Apache License 2.0
89 stars 15 forks source link

Benchmark and improve Vec and vector collect #337

Closed danakj closed 1 year ago

danakj commented 1 year ago

Apple M1 results

[ RUN      ] BenchSimdChunks.common_prefix

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              184.31 |        5,425,692.47 |    0.2% |      0.11 | `common_prefix_unsafe_array_len_pairs`
|              266.37 |        3,754,229.28 |    0.2% |      0.15 | `common_prefix_naive`
|              396.99 |        2,518,978.67 |    0.1% |      0.23 | `common_prefix_zip`
|              678.55 |        1,473,732.62 |    0.3% |      0.39 | `common_prefix_chunks_exact`
|              608.80 |        1,642,578.92 |    0.6% |      0.35 | `common_prefix_no_shortcircuit`
|              127.94 |        7,816,010.69 |    0.3% |      0.07 | `common_prefix_take_while`
[       OK ] BenchSimdChunks.common_prefix (1321 ms)
[----------] 1 test from BenchSimdChunks (1321 ms total)

[----------] 9 tests from BenchVecMap
[ RUN      ] BenchVecMap.CopyAndMultiplyInts_1000
|              426.00 |        2,347,401.78 |    0.2% |      1.02 | `std::vector::push_back, n = 1000`
|              452.67 |        2,209,136.59 |    0.6% |      1.08 | `std::vector collect, n = 1000`
|              733.03 |        1,364,205.29 |    0.2% |      1.75 | `sus::Vec::push, n = 1000`
|               98.45 |       10,157,595.04 |    0.1% |      0.24 | `sus::Vec collect, n = 1000`
[       OK ] BenchVecMap.CopyAndMultiplyInts_1000 (4094 ms)
[ RUN      ] BenchVecMap.CopyAndMultiplyInts_100_000
|           39,144.54 |           25,546.35 |    0.1% |     25.37 | `std::vector::push_back, n = 100000`
|           37,945.10 |           26,353.86 |    0.1% |     24.60 | `std::vector collect, n = 100000`
|           69,985.32 |           14,288.71 |    0.0% |     45.37 | `sus::Vec::push, n = 100000`
|            8,749.08 |          114,297.69 |    4.7% |      5.56 | `sus::Vec collect, n = 100000`
[       OK ] BenchVecMap.CopyAndMultiplyInts_100_000 (100905 ms)
[ RUN      ] BenchVecMap.CopyAndMultiplyInts_10_000_000
|        6,139,073.90 |              162.89 |    0.6% |      2.22 | `std::vector::push_back, n = 10000000`
|        6,333,447.29 |              157.89 |    0.5% |      2.29 | `std::vector collect, n = 10000000`
|        9,638,581.93 |              103.75 |    0.2% |      3.48 | `sus::Vec::push, n = 10000000`
|        3,671,226.56 |              272.39 |    0.9% |      1.33 | `sus::Vec collect, n = 10000000`
[       OK ] BenchVecMap.CopyAndMultiplyInts_10_000_000 (9318 ms)
[ RUN      ] BenchVecMap.TransformToIndices_1000
|              351.07 |        2,848,415.93 |    0.1% |      0.84 | `std::vector::push_back, n = 1000`
|              350.30 |        2,854,681.48 |    0.1% |      0.84 | `std::vector collect, n = 1000`
|              729.21 |        1,371,341.39 |    0.2% |      1.75 | `sus::Vec::push, n = 1000`
|              176.97 |        5,650,610.77 |    0.7% |      0.43 | `sus::Vec collect, n = 1000`
[       OK ] BenchVecMap.TransformToIndices_1000 (3849 ms)
[ RUN      ] BenchVecMap.TransformToIndices_100_000
|           31,204.90 |           32,046.24 |    0.1% |     20.24 | `std::vector::push_back, n = 100000`
|           31,199.00 |           32,052.31 |    0.1% |     20.23 | `std::vector collect, n = 100000`
|          183,823.28 |            5,440.01 |    0.1% |    118.92 | `sus::Vec::push, n = 100000`
|           20,561.85 |           48,633.76 |    4.3% |     13.16 | `sus::Vec collect, n = 100000`
[       OK ] BenchVecMap.TransformToIndices_100_000 (172545 ms)
[ RUN      ] BenchVecMap.TransformToIndices_10_000_000
|        7,833,283.09 |              127.66 |    0.4% |      2.82 | `std::vector::push_back, n = 10000000`
|        8,192,676.47 |              122.06 |    0.7% |      2.94 | `std::vector collect, n = 10000000`
|       17,979,895.16 |               55.62 |    0.2% |      6.47 | `sus::Vec::push, n = 10000000`
|        7,650,575.53 |              130.71 |    0.5% |      2.75 | `sus::Vec collect, n = 10000000`
[       OK ] BenchVecMap.TransformToIndices_10_000_000 (15001 ms)
[ RUN      ] BenchVecMap.MoreExpensiveIntTransformation_1000
|            3,437.08 |          290,944.59 |    0.0% |      0.01 | `std::vector::push_back, n = 1000`
|            3,427.08 |          291,793.31 |    0.0% |      0.01 | `std::vector collect, n = 1000`
|            3,784.28 |          264,250.94 |    0.0% |      0.01 | `sus::Vec::push, n = 1000`
|            3,261.01 |          306,653.20 |    0.0% |      0.01 | `sus::Vec collect, n = 1000`
[       OK ] BenchVecMap.MoreExpensiveIntTransformation_1000 (50 ms)
[ RUN      ] BenchVecMap.MoreExpensiveIntTransformation_100_000
|          343,083.33 |            2,914.74 |    0.2% |      0.01 | `std::vector::push_back, n = 100000`
|          342,139.00 |            2,922.79 |    0.1% |      0.01 | `std::vector collect, n = 100000`
|          376,541.67 |            2,655.75 |    0.0% |      0.01 | `sus::Vec::push, n = 100000`
|          325,041.67 |            3,076.53 |    0.0% |      0.01 | `sus::Vec collect, n = 100000`
[       OK ] BenchVecMap.MoreExpensiveIntTransformation_100_000 (48 ms)
[ RUN      ] BenchVecMap.MoreExpensiveIntTransformation_10_000_000
|       62,922,166.00 |               15.89 |    0.2% |      0.69 | `std::vector::push_back, n = 10000000`
|       62,676,375.00 |               15.95 |    0.1% |      0.69 | `std::vector collect, n = 10000000`
|       69,998,042.00 |               14.29 |    0.1% |      0.77 | `sus::Vec::push, n = 10000000`
|       59,237,167.00 |               16.88 |    0.1% |      0.65 | `sus::Vec collect, n = 10000000`
[       OK ] BenchVecMap.MoreExpensiveIntTransformation_10_000_000 (2815 ms)
[----------] 9 tests from BenchVecMap (308628 ms total)
danakj commented 1 year ago

Linux Intel results:


|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
[ RUN      ] BenchVecMap.CopyAndMultiplyInts_1000
|              813.17 |        1,229,760.77 |    1.2% |      1.95 | `std::vector::push_back, n = 1000`
|              720.52 |        1,387,891.30 |    0.7% |      1.72 | `std::vector collect, n = 1000`
|            1,410.21 |          709,113.27 |    0.8% |      3.39 | `sus::Vec::push, n = 1000`
|              191.53 |        5,221,035.75 |    1.7% |      0.46 | `sus::Vec collect, n = 1000`
[       OK ] BenchVecMap.CopyAndMultiplyInts_1000 (7516 ms)
[ RUN      ] BenchVecMap.CopyAndMultiplyInts_100_000
|           79,179.06 |           12,629.60 |    0.3% |     51.14 | `std::vector::push_back, n = 100000`
|           71,074.21 |           14,069.80 |    0.4% |     46.03 | `std::vector collect, n = 100000`
|          136,507.21 |            7,325.62 |    0.1% |     88.39 | `sus::Vec::push, n = 100000`
|           34,552.37 |           28,941.58 |    0.4% |     22.43 | `sus::Vec collect, n = 100000`
[       OK ] BenchVecMap.CopyAndMultiplyInts_100_000 (207986 ms)
[ RUN      ] BenchVecMap.CopyAndMultiplyInts_10_000_000
|       18,403,985.25 |               54.34 |    0.7% |      6.63 | `std::vector::push_back, n = 10000000`
|       17,599,240.32 |               56.82 |    1.2% |      6.33 | `std::vector collect, n = 10000000`
|       24,142,655.09 |               41.42 |    0.4% |      8.69 | `sus::Vec::push, n = 10000000`
|       15,998,039.23 |               62.51 |    1.1% |      5.75 | `sus::Vec collect, n = 10000000`
[       OK ] BenchVecMap.CopyAndMultiplyInts_10_000_000 (27422 ms)
[ RUN      ] BenchVecMap.TransformToIndices_1000
|              657.54 |        1,520,823.11 |    0.7% |      1.57 | `std::vector::push_back, n = 1000`
|              652.53 |        1,532,502.56 |    1.7% |      1.56 | `std::vector collect, n = 1000`
|            1,212.27 |          824,901.72 |    0.4% |      2.89 | `sus::Vec::push, n = 1000`
|              353.19 |        2,831,338.80 |    1.2% |      0.84 | `sus::Vec collect, n = 1000`
[       OK ] BenchVecMap.TransformToIndices_1000 (6856 ms)
[ RUN      ] BenchVecMap.TransformToIndices_100_000
|           67,182.40 |           14,884.85 |    0.4% |     43.77 | `std::vector::push_back, n = 100000`
|           67,273.65 |           14,864.66 |    0.4% |     43.53 | `std::vector collect, n = 100000`
|          170,414.69 |            5,868.04 |    0.3% |    110.39 | `sus::Vec::push, n = 100000`
|           57,919.65 |           17,265.30 |    0.7% |     37.50 | `sus::Vec collect, n = 100000`
[       OK ] BenchVecMap.TransformToIndices_100_000 (235189 ms)
[ RUN      ] BenchVecMap.TransformToIndices_10_000_000
|       28,772,225.53 |               34.76 |    0.9% |     10.35 | `std::vector::push_back, n = 10000000`
|       28,611,047.73 |               34.95 |    0.9% |     10.28 | `std::vector collect, n = 10000000`
|       37,878,064.61 |               26.40 |    1.2% |     13.64 | `sus::Vec::push, n = 10000000`
|       28,274,931.94 |               35.37 |    0.4% |     10.17 | `sus::Vec collect, n = 10000000`
[       OK ] BenchVecMap.TransformToIndices_10_000_000 (44469 ms)
[ RUN      ] BenchVecMap.MoreExpensiveIntTransformation_1000
|           20,342.91 |           49,157.17 |    3.7% |      0.01 | `std::vector::push_back, n = 1000`
|           21,017.31 |           47,579.83 |    6.0% |      0.01 | :wavy_dash: `std::vector collect, n = 1000` (Unstable with ~51.4 iters. Increase `minEpochIterations` to e.g. 514)
|           21,336.73 |           46,867.53 |    3.2% |      0.01 | `sus::Vec::push, n = 1000`
|           20,000.06 |           49,999.85 |    1.2% |      0.01 | `sus::Vec collect, n = 1000`
[       OK ] BenchVecMap.MoreExpensiveIntTransformation_1000 (51 ms)
[ RUN      ] BenchVecMap.MoreExpensiveIntTransformation_100_000
|        2,071,106.00 |              482.83 |    2.7% |      0.02 | `std::vector::push_back, n = 100000`
|        2,012,205.00 |              496.97 |    2.4% |      0.02 | `std::vector collect, n = 100000`
|        2,142,206.00 |              466.81 |    2.3% |      0.02 | `sus::Vec::push, n = 100000`
|        2,047,406.00 |              488.42 |    4.9% |      0.02 | `sus::Vec collect, n = 100000`
[       OK ] BenchVecMap.MoreExpensiveIntTransformation_100_000 (92 ms)
[ RUN      ] BenchVecMap.MoreExpensiveIntTransformation_10_000_000
|      216,995,117.00 |                4.61 |    0.3% |      2.38 | `std::vector::push_back, n = 10000000`
|      218,283,387.00 |                4.58 |    1.5% |      2.42 | `std::vector collect, n = 10000000`
|      234,275,067.00 |                4.27 |    0.5% |      2.58 | `sus::Vec::push, n = 10000000`
|      213,848,110.00 |                4.68 |    0.6% |      2.37 | `sus::Vec collect, n = 10000000`
[       OK ] BenchVecMap.MoreExpensiveIntTransformation_10_000_000 (9780 ms)
[----------] 9 tests from BenchVecMap (539365 ms total)```