cessen / str_indices

Count and convert between various ways of indexing utf8 string slices.
Apache License 2.0
16 stars 5 forks source link

Optimize lines_lf #20

Closed CeleritasCelery closed 1 year ago

CeleritasCelery commented 1 year ago

Many of the same optimization we have applied to the other methods. Big speedups here as well.

Results

Aarch64 benchmarks ``` lines_lf::count_breaks/lines_100 time: [2.5976 ns 2.5983 ns 2.5992 ns] thrpt: [40.489 GiB/s 40.504 GiB/s 40.514 GiB/s] change: time: [-35.681% -35.578% -35.479%] (p = 0.00 < 0.05) thrpt: [+54.988% +55.227% +55.474%] Performance has improved. lines_lf::count_breaks/lines_1000 time: [17.024 ns 17.028 ns 17.033 ns] thrpt: [61.784 GiB/s 61.804 GiB/s 61.818 GiB/s] change: time: [-40.426% -40.328% -40.231%] (p = 0.00 < 0.05) thrpt: [+67.312% +67.583% +67.858%] Performance has improved. lines_lf::count_breaks/lines_10000 time: [143.12 ns 143.17 ns 143.25 ns] thrpt: [73.468 GiB/s 73.504 GiB/s 73.531 GiB/s] change: time: [-57.386% -57.306% -57.226%] (p = 0.00 < 0.05) thrpt: [+133.79% +134.23% +134.66%] Performance has improved. lines_lf::from_byte_idx/lines_100 time: [2.5978 ns 2.5985 ns 2.5994 ns] thrpt: [40.486 GiB/s 40.500 GiB/s 40.510 GiB/s] change: time: [-35.696% -35.580% -35.474%] (p = 0.00 < 0.05) thrpt: [+54.975% +55.231% +55.512%] Performance has improved. lines_lf::from_byte_idx/lines_1000 time: [17.024 ns 17.028 ns 17.034 ns] thrpt: [61.783 GiB/s 61.804 GiB/s 61.818 GiB/s] change: time: [-40.442% -40.330% -40.212%] (p = 0.00 < 0.05) thrpt: [+67.259% +67.588% +67.904%] Performance has improved. lines_lf::from_byte_idx/lines_10000 time: [143.12 ns 143.15 ns 143.19 ns] thrpt: [73.498 GiB/s 73.519 GiB/s 73.532 GiB/s] change: time: [-57.468% -57.392% -57.312%] (p = 0.00 < 0.05) thrpt: [+134.26% +134.70% +135.12%] Performance has improved. lines_lf::to_byte_idx/lines_100 time: [3.7837 ns 3.7846 ns 3.7858 ns] thrpt: [27.798 GiB/s 27.808 GiB/s 27.814 GiB/s] change: time: [-26.168% -26.051% -25.931%] (p = 0.00 < 0.05) thrpt: [+35.010% +35.228% +35.443%] Performance has improved. lines_lf::to_byte_idx/lines_1000 time: [21.464 ns 21.469 ns 21.476 ns] thrpt: [49.004 GiB/s 49.020 GiB/s 49.030 GiB/s] change: time: [-55.387% -55.120% -54.864%] (p = 0.00 < 0.05) thrpt: [+121.55% +122.82% +124.15%] Performance has improved. lines_lf::to_byte_idx/lines_10000 time: [171.83 ns 171.86 ns 171.89 ns] thrpt: [61.225 GiB/s 61.236 GiB/s 61.245 GiB/s] change: time: [-56.113% -56.040% -55.962%] (p = 0.00 < 0.05) thrpt: [+127.08% +127.48% +127.86%] Performance has improved. ```
x86_64 benchmarks ``` lines_lf::count_breaks/lines_100 time: [5.6826 ns 5.6863 ns 5.6908 ns] thrpt: [18.493 GiB/s 18.507 GiB/s 18.520 GiB/s] change: time: [-19.977% -19.545% -19.097%] (p = 0.00 < 0.05) thrpt: [+23.604% +24.293% +24.964%] Performance has improved. lines_lf::count_breaks/lines_1000 time: [32.146 ns 32.395 ns 32.664 ns] thrpt: [32.218 GiB/s 32.486 GiB/s 32.737 GiB/s] change: time: [-22.832% -22.224% -21.627%] (p = 0.00 < 0.05) thrpt: [+27.596% +28.575% +29.588%] Performance has improved. lines_lf::count_breaks/lines_10000 time: [263.53 ns 263.87 ns 264.39 ns] thrpt: [39.805 GiB/s 39.882 GiB/s 39.934 GiB/s] change: time: [-2.6589% -1.8193% -1.1992%] (p = 0.00 < 0.05) thrpt: [+1.2138% +1.8530% +2.7315%] Performance has improved. lines_lf::from_byte_idx/lines_100 time: [5.9435 ns 5.9478 ns 5.9535 ns] thrpt: [17.677 GiB/s 17.694 GiB/s 17.707 GiB/s] change: time: [-22.920% -22.213% -21.630%] (p = 0.00 < 0.05) thrpt: [+27.600% +28.556% +29.735%] Performance has improved. lines_lf::from_byte_idx/lines_1000 time: [31.678 ns 31.760 ns 31.869 ns] thrpt: [33.022 GiB/s 33.136 GiB/s 33.222 GiB/s] change: time: [-31.219% -28.742% -25.887%] (p = 0.00 < 0.05) thrpt: [+34.929% +40.335% +45.390%] Performance has improved. lines_lf::from_byte_idx/lines_10000 time: [263.78 ns 263.95 ns 264.16 ns] thrpt: [39.839 GiB/s 39.870 GiB/s 39.896 GiB/s] change: time: [-5.6641% -4.5534% -3.7274%] (p = 0.00 < 0.05) thrpt: [+3.8717% +4.7706% +6.0042%] Performance has improved. lines_lf::to_byte_idx/lines_100 time: [7.1705 ns 7.1796 ns 7.1890 ns] thrpt: [14.639 GiB/s 14.658 GiB/s 14.677 GiB/s] change: time: [-18.953% -18.667% -18.415%] (p = 0.00 < 0.05) thrpt: [+22.572% +22.952% +23.385%] Performance has improved. lines_lf::to_byte_idx/lines_1000 time: [41.566 ns 41.615 ns 41.671 ns] thrpt: [25.255 GiB/s 25.289 GiB/s 25.318 GiB/s] change: time: [-51.008% -50.796% -50.592%] (p = 0.00 < 0.05) thrpt: [+102.40% +103.24% +104.11%] Performance has improved. lines_lf::to_byte_idx/lines_10000 time: [314.40 ns 314.53 ns 314.69 ns] thrpt: [33.443 GiB/s 33.459 GiB/s 33.473 GiB/s] change: time: [-56.125% -55.976% -55.837%] (p = 0.00 < 0.05) thrpt: [+126.43% +127.15% +127.92%] Performance has improved. ```
cessen commented 1 year ago

Ah, thanks a bunch for this! I'll try to review this soon.

cessen commented 1 year ago

Thanks a bunch!