ekmett / machines

Networks of composable stream transducers
Other
340 stars 46 forks source link

Process performance improvements #82

Closed HuwCampbell closed 7 years ago

HuwCampbell commented 8 years ago

This one might be a little controversial, but I did some experiments rewriting Process and Source Machines explicitly instead of using a Plan and got performance improvements essentially across the board. I also believe there were some bugs in buffered, as there were situations where it would lead to empty list outputs.

This does make these machines less readable and harder to fathom however.

Here's a handful of before and afters

Edit: Further improvements below with inlining.

HuwCampbell commented 7 years ago

So I added some INLINABLE pragmas, which brings the perfomance more in line with pipes and conduit (and sometimes faster).

@acowley @treeowl

Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking map/machines
time                 28.92 ms   (24.94 ms .. 33.10 ms)
                     0.918 R²   (0.845 R² .. 0.967 R²)
mean                 35.69 ms   (32.39 ms .. 40.78 ms)
std dev              8.662 ms   (5.867 ms .. 11.04 ms)
variance introduced by outliers: 80% (severely inflated)

benchmarking map/pipes
time                 26.05 ms   (24.77 ms .. 27.38 ms)
                     0.992 R²   (0.987 R² .. 0.998 R²)
mean                 25.84 ms   (25.31 ms .. 26.44 ms)
std dev              1.232 ms   (953.7 μs .. 1.554 ms)
variance introduced by outliers: 15% (moderately inflated)

benchmarking map/conduit
time                 27.65 ms   (27.16 ms .. 28.34 ms)
                     0.997 R²   (0.995 R² .. 0.999 R²)
mean                 27.27 ms   (26.73 ms .. 27.76 ms)
std dev              1.113 ms   (804.9 μs .. 1.614 ms)
variance introduced by outliers: 10% (moderately inflated)

benchmarking drop/machines
time                 12.14 ms   (11.89 ms .. 12.36 ms)
                     0.997 R²   (0.994 R² .. 0.999 R²)
mean                 13.43 ms   (13.07 ms .. 14.14 ms)
std dev              1.303 ms   (823.7 μs .. 1.883 ms)
variance introduced by outliers: 50% (severely inflated)

benchmarking drop/pipes
time                 15.77 ms   (14.70 ms .. 16.93 ms)
                     0.983 R²   (0.969 R² .. 0.994 R²)
mean                 17.11 ms   (16.57 ms .. 17.82 ms)
std dev              1.529 ms   (1.098 ms .. 2.352 ms)
variance introduced by outliers: 44% (moderately inflated)

benchmarking drop/conduit
time                 22.23 ms   (21.24 ms .. 23.03 ms)
                     0.995 R²   (0.991 R² .. 0.998 R²)
mean                 21.29 ms   (20.97 ms .. 21.66 ms)
std dev              820.1 μs   (633.8 μs .. 1.098 ms)
variance introduced by outliers: 13% (moderately inflated)

benchmarking dropWhile/machines
time                 12.55 ms   (11.76 ms .. 13.72 ms)
                     0.982 R²   (0.967 R² .. 0.998 R²)
mean                 11.96 ms   (11.77 ms .. 12.32 ms)
std dev              656.8 μs   (366.8 μs .. 1.248 ms)
variance introduced by outliers: 25% (moderately inflated)

benchmarking dropWhile/pipes
time                 18.04 ms   (17.33 ms .. 18.70 ms)
                     0.995 R²   (0.991 R² .. 0.999 R²)
mean                 18.18 ms   (17.88 ms .. 18.55 ms)
std dev              759.3 μs   (556.6 μs .. 1.121 ms)
variance introduced by outliers: 13% (moderately inflated)

benchmarking dropWhile/conduit
time                 23.02 ms   (22.08 ms .. 24.00 ms)
                     0.995 R²   (0.991 R² .. 0.999 R²)
mean                 21.71 ms   (21.32 ms .. 22.15 ms)
std dev              946.7 μs   (638.4 μs .. 1.410 ms)
variance introduced by outliers: 14% (moderately inflated)

benchmarking scan/machines
time                 38.39 ms   (37.26 ms .. 39.59 ms)
                     0.997 R²   (0.994 R² .. 0.999 R²)
mean                 36.84 ms   (36.26 ms .. 37.56 ms)
std dev              1.278 ms   (987.5 μs .. 1.772 ms)

benchmarking scan/pipes
time                 30.05 ms   (28.65 ms .. 31.34 ms)
                     0.994 R²   (0.989 R² .. 0.998 R²)
mean                 29.38 ms   (28.33 ms .. 30.08 ms)
std dev              1.740 ms   (1.013 ms .. 2.922 ms)
variance introduced by outliers: 22% (moderately inflated)

benchmarking scan/conduit
time                 49.43 ms   (47.13 ms .. 51.99 ms)
                     0.995 R²   (0.991 R² .. 0.999 R²)
mean                 46.13 ms   (44.98 ms .. 47.38 ms)
std dev              2.323 ms   (1.743 ms .. 3.255 ms)
variance introduced by outliers: 13% (moderately inflated)

benchmarking take/machines
time                 32.65 ms   (31.07 ms .. 33.98 ms)
                     0.995 R²   (0.992 R² .. 0.998 R²)
mean                 30.88 ms   (30.32 ms .. 31.48 ms)
std dev              1.249 ms   (933.3 μs .. 1.633 ms)
variance introduced by outliers: 11% (moderately inflated)

benchmarking take/pipes
time                 28.91 ms   (27.52 ms .. 30.28 ms)
                     0.994 R²   (0.990 R² .. 0.998 R²)
mean                 27.77 ms   (27.28 ms .. 28.33 ms)
std dev              1.107 ms   (859.5 μs .. 1.534 ms)
variance introduced by outliers: 10% (moderately inflated)

benchmarking take/conduit
time                 37.10 ms   (36.23 ms .. 38.39 ms)
                     0.998 R²   (0.995 R² .. 0.999 R²)
mean                 36.20 ms   (35.67 ms .. 36.79 ms)
std dev              1.103 ms   (791.4 μs .. 1.579 ms)

benchmarking takeWhile/machines
time                 44.32 ms   (43.13 ms .. 45.40 ms)
                     0.997 R²   (0.994 R² .. 0.999 R²)
mean                 42.29 ms   (41.35 ms .. 43.04 ms)
std dev              1.692 ms   (1.247 ms .. 2.275 ms)
variance introduced by outliers: 13% (moderately inflated)

benchmarking takeWhile/pipes
time                 27.55 ms   (26.85 ms .. 28.47 ms)
                     0.998 R²   (0.995 R² .. 1.000 R²)
mean                 27.03 ms   (26.68 ms .. 27.34 ms)
std dev              727.9 μs   (484.8 μs .. 1.122 ms)

benchmarking takeWhile/conduit
time                 38.16 ms   (37.04 ms .. 40.01 ms)
                     0.995 R²   (0.991 R² .. 0.998 R²)
mean                 37.02 ms   (36.26 ms .. 37.83 ms)
std dev              1.579 ms   (1.259 ms .. 1.870 ms)
variance introduced by outliers: 12% (moderately inflated)

benchmarking fold/machines
time                 14.02 ms   (13.52 ms .. 14.45 ms)
                     0.994 R²   (0.990 R² .. 0.997 R²)
mean                 15.95 ms   (15.32 ms .. 16.74 ms)
std dev              1.837 ms   (1.443 ms .. 2.245 ms)
variance introduced by outliers: 58% (severely inflated)

benchmarking fold/pipes
time                 7.492 ms   (7.168 ms .. 7.829 ms)
                     0.976 R²   (0.951 R² .. 0.991 R²)
mean                 7.770 ms   (7.488 ms .. 8.045 ms)
std dev              831.3 μs   (616.8 μs .. 1.191 ms)
variance introduced by outliers: 60% (severely inflated)

benchmarking fold/conduit
time                 19.58 ms   (18.36 ms .. 20.58 ms)
                     0.988 R²   (0.976 R² .. 0.996 R²)
mean                 20.44 ms   (19.93 ms .. 20.88 ms)
std dev              1.118 ms   (884.9 μs .. 1.505 ms)
variance introduced by outliers: 22% (moderately inflated)

benchmarking filter/machines
time                 30.50 ms   (29.48 ms .. 31.20 ms)
                     0.996 R²   (0.992 R² .. 0.999 R²)
mean                 31.32 ms   (30.61 ms .. 32.18 ms)
std dev              1.648 ms   (1.266 ms .. 2.144 ms)
variance introduced by outliers: 17% (moderately inflated)

benchmarking filter/pipes
time                 30.08 ms   (29.46 ms .. 30.65 ms)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 31.07 ms   (30.55 ms .. 31.95 ms)
std dev              1.293 ms   (653.3 μs .. 1.839 ms)
variance introduced by outliers: 11% (moderately inflated)

benchmarking filter/conduit
time                 28.58 ms   (27.39 ms .. 29.75 ms)
                     0.994 R²   (0.989 R² .. 0.998 R²)
mean                 29.17 ms   (28.45 ms .. 30.04 ms)
std dev              1.682 ms   (1.234 ms .. 2.388 ms)
variance introduced by outliers: 21% (moderately inflated)

benchmarking mapM/machines
time                 31.56 ms   (30.36 ms .. 32.62 ms)
                     0.997 R²   (0.994 R² .. 0.999 R²)
mean                 30.68 ms   (30.19 ms .. 31.18 ms)
std dev              1.021 ms   (742.8 μs .. 1.431 ms)
variance introduced by outliers: 11% (moderately inflated)

benchmarking mapM/pipes
time                 55.48 ms   (52.22 ms .. 58.76 ms)
                     0.994 R²   (0.989 R² .. 0.998 R²)
mean                 51.65 ms   (50.07 ms .. 53.24 ms)
std dev              2.977 ms   (2.449 ms .. 3.592 ms)
variance introduced by outliers: 15% (moderately inflated)

benchmarking mapM/conduit
time                 34.55 ms   (32.39 ms .. 36.68 ms)
                     0.991 R²   (0.988 R² .. 0.997 R²)
mean                 31.37 ms   (30.49 ms .. 32.42 ms)
std dev              1.992 ms   (1.443 ms .. 2.711 ms)
variance introduced by outliers: 22% (moderately inflated)

benchmarking plan-test-machine/final
time                 10.58 ms   (10.27 ms .. 10.88 ms)
                     0.996 R²   (0.994 R² .. 0.998 R²)
mean                 10.19 ms   (9.985 ms .. 10.34 ms)
std dev              462.4 μs   (339.3 μs .. 748.2 μs)
variance introduced by outliers: 19% (moderately inflated)

benchmarking plan-test-machine/finalOr
time                 10.73 ms   (10.52 ms .. 10.95 ms)
                     0.997 R²   (0.995 R² .. 0.999 R²)
mean                 10.52 ms   (10.27 ms .. 10.67 ms)
std dev              506.1 μs   (307.3 μs .. 845.6 μs)
variance introduced by outliers: 20% (moderately inflated)

benchmarking plan-test-machine/buffered
time                 21.41 ms   (20.76 ms .. 22.11 ms)
                     0.994 R²   (0.989 R² .. 0.998 R²)
mean                 21.39 ms   (20.98 ms .. 21.76 ms)
std dev              933.4 μs   (788.4 μs .. 1.124 ms)
variance introduced by outliers: 13% (moderately inflated)

Benchmark benchmarks: FINISH
acowley commented 7 years ago

I'm not against making various core definitions uglier for the sake of substantial performance gains, and I appreciate you taking the initiative to push on this!

Is repeatedly itself a significant culprit? Can we do something more targeted to help there? It's unfortunate that we can't get GHC to push through constructs and repeatedly itself to build the lower-level code we want.

Regardless, I'd be inclined to go ahead with this change.

alang9 commented 7 years ago

I can get part of the performance improvement by dropping 12ec9c8 Rewrite most Processes explicitly and adding INLINE pragmas to construct and repeatedly. See https://github.com/alang9/machines/tree/inline

Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking map/machines
time                 29.80 ms   (28.79 ms .. 31.36 ms)
                     0.986 R²   (0.965 R² .. 0.997 R²)
mean                 29.71 ms   (28.90 ms .. 30.89 ms)
std dev              1.994 ms   (1.288 ms .. 2.995 ms)
variance introduced by outliers: 22% (moderately inflated)

benchmarking map/pipes
time                 42.73 ms   (39.38 ms .. 46.30 ms)
                     0.988 R²   (0.975 R² .. 0.998 R²)
mean                 46.50 ms   (45.08 ms .. 48.50 ms)
std dev              3.219 ms   (1.894 ms .. 5.098 ms)
variance introduced by outliers: 20% (moderately inflated)

benchmarking map/conduit
time                 34.29 ms   (31.55 ms .. 36.86 ms)
                     0.983 R²   (0.967 R² .. 0.997 R²)
mean                 34.71 ms   (33.57 ms .. 36.70 ms)
std dev              3.081 ms   (1.613 ms .. 4.434 ms)
variance introduced by outliers: 36% (moderately inflated)

benchmarking drop/machines
time                 18.94 ms   (17.78 ms .. 19.92 ms)
                     0.989 R²   (0.981 R² .. 0.996 R²)
mean                 18.20 ms   (17.80 ms .. 18.74 ms)
std dev              1.125 ms   (884.5 μs .. 1.556 ms)
variance introduced by outliers: 26% (moderately inflated)

benchmarking drop/pipes
time                 36.31 ms   (32.06 ms .. 41.25 ms)
                     0.960 R²   (0.929 R² .. 0.987 R²)
mean                 32.13 ms   (30.53 ms .. 34.22 ms)
std dev              3.665 ms   (2.634 ms .. 4.649 ms)
variance introduced by outliers: 48% (moderately inflated)

benchmarking drop/conduit
time                 24.17 ms   (22.60 ms .. 25.37 ms)
                     0.981 R²   (0.963 R² .. 0.992 R²)
mean                 23.37 ms   (22.43 ms .. 24.37 ms)
std dev              2.212 ms   (1.814 ms .. 2.675 ms)
variance introduced by outliers: 43% (moderately inflated)

benchmarking dropWhile/machines
time                 29.05 ms   (27.73 ms .. 30.38 ms)
                     0.981 R²   (0.951 R² .. 0.996 R²)
mean                 28.85 ms   (27.98 ms .. 30.40 ms)
std dev              2.352 ms   (1.455 ms .. 3.745 ms)
variance introduced by outliers: 32% (moderately inflated)

benchmarking dropWhile/pipes
time                 20.01 ms   (19.11 ms .. 21.07 ms)
                     0.989 R²   (0.976 R² .. 0.997 R²)
mean                 19.77 ms   (19.41 ms .. 20.27 ms)
std dev              968.1 μs   (675.3 μs .. 1.475 ms)
variance introduced by outliers: 18% (moderately inflated)

benchmarking dropWhile/conduit
time                 24.72 ms   (22.80 ms .. 26.46 ms)
                     0.980 R²   (0.966 R² .. 0.991 R²)
mean                 22.99 ms   (22.30 ms .. 24.00 ms)
std dev              1.965 ms   (1.494 ms .. 2.571 ms)
variance introduced by outliers: 38% (moderately inflated)

benchmarking scan/machines
time                 74.13 ms   (71.46 ms .. 77.44 ms)
                     0.997 R²   (0.992 R² .. 1.000 R²)
mean                 74.08 ms   (72.95 ms .. 75.43 ms)
std dev              2.054 ms   (1.385 ms .. 2.915 ms)

benchmarking scan/pipes
time                 40.50 ms   (36.57 ms .. 44.51 ms)
                     0.970 R²   (0.938 R² .. 0.991 R²)
mean                 43.68 ms   (41.60 ms .. 46.81 ms)
std dev              4.896 ms   (3.429 ms .. 7.539 ms)
variance introduced by outliers: 41% (moderately inflated)

benchmarking scan/conduit
time                 71.50 ms   (61.99 ms .. 80.79 ms)
                     0.964 R²   (0.923 R² .. 0.989 R²)
mean                 58.38 ms   (53.86 ms .. 63.93 ms)
std dev              8.760 ms   (6.412 ms .. 11.70 ms)
variance introduced by outliers: 56% (severely inflated)

benchmarking take/machines
time                 59.00 ms   (54.34 ms .. 61.80 ms)
                     0.990 R²   (0.973 R² .. 0.999 R²)
mean                 59.19 ms   (57.15 ms .. 60.53 ms)
std dev              2.946 ms   (1.995 ms .. 4.333 ms)
variance introduced by outliers: 15% (moderately inflated)

benchmarking take/pipes
time                 77.76 ms   (71.07 ms .. 86.97 ms)
                     0.976 R²   (0.947 R² .. 0.994 R²)
mean                 65.20 ms   (60.54 ms .. 70.45 ms)
std dev              8.530 ms   (6.583 ms .. 10.47 ms)
variance introduced by outliers: 44% (moderately inflated)

benchmarking take/conduit
time                 47.82 ms   (38.19 ms .. 59.14 ms)
                     0.894 R²   (0.805 R² .. 0.985 R²)
mean                 44.69 ms   (41.80 ms .. 48.89 ms)
std dev              6.650 ms   (4.951 ms .. 8.876 ms)
variance introduced by outliers: 55% (severely inflated)

benchmarking takeWhile/machines
time                 44.80 ms   (41.23 ms .. 49.81 ms)
                     0.976 R²   (0.946 R² .. 0.994 R²)
mean                 50.66 ms   (47.99 ms .. 53.64 ms)
std dev              5.032 ms   (3.873 ms .. 6.179 ms)
variance introduced by outliers: 37% (moderately inflated)

benchmarking takeWhile/pipes
time                 46.95 ms   (41.92 ms .. 51.76 ms)
                     0.967 R²   (0.907 R² .. 0.993 R²)
mean                 49.77 ms   (46.53 ms .. 52.10 ms)
std dev              4.878 ms   (3.307 ms .. 6.403 ms)
variance introduced by outliers: 37% (moderately inflated)

benchmarking takeWhile/conduit
time                 45.86 ms   (39.98 ms .. 53.58 ms)
                     0.928 R²   (0.846 R² .. 0.994 R²)
mean                 46.01 ms   (43.26 ms .. 50.88 ms)
std dev              6.881 ms   (2.061 ms .. 9.371 ms)
variance introduced by outliers: 56% (severely inflated)

benchmarking fold/machines
time                 31.03 ms   (27.78 ms .. 33.85 ms)
                     0.970 R²   (0.943 R² .. 0.991 R²)
mean                 32.23 ms   (30.79 ms .. 34.18 ms)
std dev              3.447 ms   (2.704 ms .. 4.350 ms)
variance introduced by outliers: 45% (moderately inflated)

benchmarking fold/pipes
time                 13.74 ms   (13.58 ms .. 13.99 ms)
                     0.997 R²   (0.993 R² .. 0.999 R²)
mean                 13.61 ms   (13.45 ms .. 13.78 ms)
std dev              408.4 μs   (292.7 μs .. 628.8 μs)
variance introduced by outliers: 11% (moderately inflated)

benchmarking fold/conduit
time                 22.43 ms   (20.49 ms .. 24.79 ms)
                     0.968 R²   (0.941 R² .. 0.996 R²)
mean                 21.04 ms   (20.50 ms .. 21.98 ms)
std dev              1.671 ms   (915.8 μs .. 2.521 ms)
variance introduced by outliers: 36% (moderately inflated)

benchmarking filter/machines
time                 42.06 ms   (37.85 ms .. 47.26 ms)
                     0.966 R²   (0.943 R² .. 0.999 R²)
mean                 38.52 ms   (37.12 ms .. 41.29 ms)
std dev              3.647 ms   (2.126 ms .. 4.760 ms)
variance introduced by outliers: 38% (moderately inflated)

benchmarking filter/pipes
time                 76.32 ms   (72.25 ms .. 88.09 ms)
                     0.966 R²   (0.892 R² .. 0.999 R²)
mean                 78.47 ms   (76.02 ms .. 84.95 ms)
std dev              6.521 ms   (1.150 ms .. 10.42 ms)
variance introduced by outliers: 28% (moderately inflated)

benchmarking filter/conduit
time                 36.45 ms   (33.16 ms .. 38.64 ms)
                     0.970 R²   (0.926 R² .. 0.990 R²)
mean                 35.74 ms   (34.17 ms .. 38.13 ms)
std dev              3.921 ms   (2.581 ms .. 5.930 ms)
variance introduced by outliers: 42% (moderately inflated)

benchmarking mapM/machines
time                 26.56 ms   (23.29 ms .. 29.10 ms)
                     0.958 R²   (0.922 R² .. 0.982 R²)
mean                 30.99 ms   (29.27 ms .. 33.09 ms)
std dev              4.209 ms   (3.505 ms .. 4.896 ms)
variance introduced by outliers: 57% (severely inflated)

benchmarking mapM/pipes
time                 186.9 ms   (177.8 ms .. 197.0 ms)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 190.2 ms   (186.9 ms .. 195.8 ms)
std dev              5.796 ms   (1.591 ms .. 7.869 ms)
variance introduced by outliers: 14% (moderately inflated)

benchmarking mapM/conduit
time                 48.65 ms   (44.72 ms .. 52.36 ms)
                     0.979 R²   (0.946 R² .. 0.996 R²)
mean                 54.95 ms   (52.44 ms .. 58.57 ms)
std dev              5.599 ms   (4.257 ms .. 7.520 ms)
variance introduced by outliers: 37% (moderately inflated)

benchmarking plan-test-machine/final
time                 21.36 ms   (19.91 ms .. 22.68 ms)
                     0.987 R²   (0.978 R² .. 0.996 R²)
mean                 22.62 ms   (21.87 ms .. 23.71 ms)
std dev              2.100 ms   (1.507 ms .. 2.898 ms)
variance introduced by outliers: 39% (moderately inflated)

benchmarking plan-test-machine/finalOr
time                 14.00 ms   (13.24 ms .. 14.79 ms)
                     0.978 R²   (0.959 R² .. 0.993 R²)
mean                 15.15 ms   (14.51 ms .. 16.14 ms)
std dev              1.818 ms   (1.226 ms .. 2.240 ms)
variance introduced by outliers: 58% (severely inflated)

benchmarking plan-test-machine/buffered
time                 27.44 ms   (26.17 ms .. 28.90 ms)
                     0.993 R²   (0.987 R² .. 0.998 R²)
mean                 26.91 ms   (26.41 ms .. 27.78 ms)
std dev              1.392 ms   (836.7 μs .. 2.177 ms)
variance introduced by outliers: 16% (moderately inflated)

Benchmark benchmarks: FINISH
HuwCampbell commented 7 years ago

Yep, I tried that as well.

It does look like there's a 2x speed up on most benchmarks just by inlining. But there's an additional 2x speed up on fold, scan and a few others, so it might be worth doing both sets of changes.

alang9 commented 7 years ago

In that case, does it make sense to apply the transformation to just fold, scan and other functions that will actually benefit from it?

HuwCampbell commented 7 years ago

I've a branch https://github.com/HuwCampbell/machines/tree/topic/comparisons in which all can be benchmarked and compared. There's quite a few which benefit from explicit rewrites.

I don't really mind if they're all fast.

HuwCampbell commented 7 years ago

I'm pretty happy with these changes. Performance improvements are worthwhile, and there are a few other things like tests and better benchmarks.

Review?

HuwCampbell commented 7 years ago

Ok

YoEight commented 7 years ago

Thanks for your work !