Closed HuwCampbell closed 7 years ago
So I added some INLINABLE pragmas, which brings the perfomance more in line with pipes and conduit (and sometimes faster).
@acowley @treeowl
Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking map/machines
time 28.92 ms (24.94 ms .. 33.10 ms)
0.918 R² (0.845 R² .. 0.967 R²)
mean 35.69 ms (32.39 ms .. 40.78 ms)
std dev 8.662 ms (5.867 ms .. 11.04 ms)
variance introduced by outliers: 80% (severely inflated)
benchmarking map/pipes
time 26.05 ms (24.77 ms .. 27.38 ms)
0.992 R² (0.987 R² .. 0.998 R²)
mean 25.84 ms (25.31 ms .. 26.44 ms)
std dev 1.232 ms (953.7 μs .. 1.554 ms)
variance introduced by outliers: 15% (moderately inflated)
benchmarking map/conduit
time 27.65 ms (27.16 ms .. 28.34 ms)
0.997 R² (0.995 R² .. 0.999 R²)
mean 27.27 ms (26.73 ms .. 27.76 ms)
std dev 1.113 ms (804.9 μs .. 1.614 ms)
variance introduced by outliers: 10% (moderately inflated)
benchmarking drop/machines
time 12.14 ms (11.89 ms .. 12.36 ms)
0.997 R² (0.994 R² .. 0.999 R²)
mean 13.43 ms (13.07 ms .. 14.14 ms)
std dev 1.303 ms (823.7 μs .. 1.883 ms)
variance introduced by outliers: 50% (severely inflated)
benchmarking drop/pipes
time 15.77 ms (14.70 ms .. 16.93 ms)
0.983 R² (0.969 R² .. 0.994 R²)
mean 17.11 ms (16.57 ms .. 17.82 ms)
std dev 1.529 ms (1.098 ms .. 2.352 ms)
variance introduced by outliers: 44% (moderately inflated)
benchmarking drop/conduit
time 22.23 ms (21.24 ms .. 23.03 ms)
0.995 R² (0.991 R² .. 0.998 R²)
mean 21.29 ms (20.97 ms .. 21.66 ms)
std dev 820.1 μs (633.8 μs .. 1.098 ms)
variance introduced by outliers: 13% (moderately inflated)
benchmarking dropWhile/machines
time 12.55 ms (11.76 ms .. 13.72 ms)
0.982 R² (0.967 R² .. 0.998 R²)
mean 11.96 ms (11.77 ms .. 12.32 ms)
std dev 656.8 μs (366.8 μs .. 1.248 ms)
variance introduced by outliers: 25% (moderately inflated)
benchmarking dropWhile/pipes
time 18.04 ms (17.33 ms .. 18.70 ms)
0.995 R² (0.991 R² .. 0.999 R²)
mean 18.18 ms (17.88 ms .. 18.55 ms)
std dev 759.3 μs (556.6 μs .. 1.121 ms)
variance introduced by outliers: 13% (moderately inflated)
benchmarking dropWhile/conduit
time 23.02 ms (22.08 ms .. 24.00 ms)
0.995 R² (0.991 R² .. 0.999 R²)
mean 21.71 ms (21.32 ms .. 22.15 ms)
std dev 946.7 μs (638.4 μs .. 1.410 ms)
variance introduced by outliers: 14% (moderately inflated)
benchmarking scan/machines
time 38.39 ms (37.26 ms .. 39.59 ms)
0.997 R² (0.994 R² .. 0.999 R²)
mean 36.84 ms (36.26 ms .. 37.56 ms)
std dev 1.278 ms (987.5 μs .. 1.772 ms)
benchmarking scan/pipes
time 30.05 ms (28.65 ms .. 31.34 ms)
0.994 R² (0.989 R² .. 0.998 R²)
mean 29.38 ms (28.33 ms .. 30.08 ms)
std dev 1.740 ms (1.013 ms .. 2.922 ms)
variance introduced by outliers: 22% (moderately inflated)
benchmarking scan/conduit
time 49.43 ms (47.13 ms .. 51.99 ms)
0.995 R² (0.991 R² .. 0.999 R²)
mean 46.13 ms (44.98 ms .. 47.38 ms)
std dev 2.323 ms (1.743 ms .. 3.255 ms)
variance introduced by outliers: 13% (moderately inflated)
benchmarking take/machines
time 32.65 ms (31.07 ms .. 33.98 ms)
0.995 R² (0.992 R² .. 0.998 R²)
mean 30.88 ms (30.32 ms .. 31.48 ms)
std dev 1.249 ms (933.3 μs .. 1.633 ms)
variance introduced by outliers: 11% (moderately inflated)
benchmarking take/pipes
time 28.91 ms (27.52 ms .. 30.28 ms)
0.994 R² (0.990 R² .. 0.998 R²)
mean 27.77 ms (27.28 ms .. 28.33 ms)
std dev 1.107 ms (859.5 μs .. 1.534 ms)
variance introduced by outliers: 10% (moderately inflated)
benchmarking take/conduit
time 37.10 ms (36.23 ms .. 38.39 ms)
0.998 R² (0.995 R² .. 0.999 R²)
mean 36.20 ms (35.67 ms .. 36.79 ms)
std dev 1.103 ms (791.4 μs .. 1.579 ms)
benchmarking takeWhile/machines
time 44.32 ms (43.13 ms .. 45.40 ms)
0.997 R² (0.994 R² .. 0.999 R²)
mean 42.29 ms (41.35 ms .. 43.04 ms)
std dev 1.692 ms (1.247 ms .. 2.275 ms)
variance introduced by outliers: 13% (moderately inflated)
benchmarking takeWhile/pipes
time 27.55 ms (26.85 ms .. 28.47 ms)
0.998 R² (0.995 R² .. 1.000 R²)
mean 27.03 ms (26.68 ms .. 27.34 ms)
std dev 727.9 μs (484.8 μs .. 1.122 ms)
benchmarking takeWhile/conduit
time 38.16 ms (37.04 ms .. 40.01 ms)
0.995 R² (0.991 R² .. 0.998 R²)
mean 37.02 ms (36.26 ms .. 37.83 ms)
std dev 1.579 ms (1.259 ms .. 1.870 ms)
variance introduced by outliers: 12% (moderately inflated)
benchmarking fold/machines
time 14.02 ms (13.52 ms .. 14.45 ms)
0.994 R² (0.990 R² .. 0.997 R²)
mean 15.95 ms (15.32 ms .. 16.74 ms)
std dev 1.837 ms (1.443 ms .. 2.245 ms)
variance introduced by outliers: 58% (severely inflated)
benchmarking fold/pipes
time 7.492 ms (7.168 ms .. 7.829 ms)
0.976 R² (0.951 R² .. 0.991 R²)
mean 7.770 ms (7.488 ms .. 8.045 ms)
std dev 831.3 μs (616.8 μs .. 1.191 ms)
variance introduced by outliers: 60% (severely inflated)
benchmarking fold/conduit
time 19.58 ms (18.36 ms .. 20.58 ms)
0.988 R² (0.976 R² .. 0.996 R²)
mean 20.44 ms (19.93 ms .. 20.88 ms)
std dev 1.118 ms (884.9 μs .. 1.505 ms)
variance introduced by outliers: 22% (moderately inflated)
benchmarking filter/machines
time 30.50 ms (29.48 ms .. 31.20 ms)
0.996 R² (0.992 R² .. 0.999 R²)
mean 31.32 ms (30.61 ms .. 32.18 ms)
std dev 1.648 ms (1.266 ms .. 2.144 ms)
variance introduced by outliers: 17% (moderately inflated)
benchmarking filter/pipes
time 30.08 ms (29.46 ms .. 30.65 ms)
0.998 R² (0.996 R² .. 0.999 R²)
mean 31.07 ms (30.55 ms .. 31.95 ms)
std dev 1.293 ms (653.3 μs .. 1.839 ms)
variance introduced by outliers: 11% (moderately inflated)
benchmarking filter/conduit
time 28.58 ms (27.39 ms .. 29.75 ms)
0.994 R² (0.989 R² .. 0.998 R²)
mean 29.17 ms (28.45 ms .. 30.04 ms)
std dev 1.682 ms (1.234 ms .. 2.388 ms)
variance introduced by outliers: 21% (moderately inflated)
benchmarking mapM/machines
time 31.56 ms (30.36 ms .. 32.62 ms)
0.997 R² (0.994 R² .. 0.999 R²)
mean 30.68 ms (30.19 ms .. 31.18 ms)
std dev 1.021 ms (742.8 μs .. 1.431 ms)
variance introduced by outliers: 11% (moderately inflated)
benchmarking mapM/pipes
time 55.48 ms (52.22 ms .. 58.76 ms)
0.994 R² (0.989 R² .. 0.998 R²)
mean 51.65 ms (50.07 ms .. 53.24 ms)
std dev 2.977 ms (2.449 ms .. 3.592 ms)
variance introduced by outliers: 15% (moderately inflated)
benchmarking mapM/conduit
time 34.55 ms (32.39 ms .. 36.68 ms)
0.991 R² (0.988 R² .. 0.997 R²)
mean 31.37 ms (30.49 ms .. 32.42 ms)
std dev 1.992 ms (1.443 ms .. 2.711 ms)
variance introduced by outliers: 22% (moderately inflated)
benchmarking plan-test-machine/final
time 10.58 ms (10.27 ms .. 10.88 ms)
0.996 R² (0.994 R² .. 0.998 R²)
mean 10.19 ms (9.985 ms .. 10.34 ms)
std dev 462.4 μs (339.3 μs .. 748.2 μs)
variance introduced by outliers: 19% (moderately inflated)
benchmarking plan-test-machine/finalOr
time 10.73 ms (10.52 ms .. 10.95 ms)
0.997 R² (0.995 R² .. 0.999 R²)
mean 10.52 ms (10.27 ms .. 10.67 ms)
std dev 506.1 μs (307.3 μs .. 845.6 μs)
variance introduced by outliers: 20% (moderately inflated)
benchmarking plan-test-machine/buffered
time 21.41 ms (20.76 ms .. 22.11 ms)
0.994 R² (0.989 R² .. 0.998 R²)
mean 21.39 ms (20.98 ms .. 21.76 ms)
std dev 933.4 μs (788.4 μs .. 1.124 ms)
variance introduced by outliers: 13% (moderately inflated)
Benchmark benchmarks: FINISH
I'm not against making various core definitions uglier for the sake of substantial performance gains, and I appreciate you taking the initiative to push on this!
Is repeatedly
itself a significant culprit? Can we do something more targeted to help there? It's unfortunate that we can't get GHC to push through construct
s and repeatedly
itself to build the lower-level code we want.
Regardless, I'd be inclined to go ahead with this change.
I can get part of the performance improvement by dropping 12ec9c8 Rewrite most Processes explicitly
and adding INLINE
pragmas to construct
and repeatedly
. See https://github.com/alang9/machines/tree/inline
Running 1 benchmarks...
Benchmark benchmarks: RUNNING...
benchmarking map/machines
time 29.80 ms (28.79 ms .. 31.36 ms)
0.986 R² (0.965 R² .. 0.997 R²)
mean 29.71 ms (28.90 ms .. 30.89 ms)
std dev 1.994 ms (1.288 ms .. 2.995 ms)
variance introduced by outliers: 22% (moderately inflated)
benchmarking map/pipes
time 42.73 ms (39.38 ms .. 46.30 ms)
0.988 R² (0.975 R² .. 0.998 R²)
mean 46.50 ms (45.08 ms .. 48.50 ms)
std dev 3.219 ms (1.894 ms .. 5.098 ms)
variance introduced by outliers: 20% (moderately inflated)
benchmarking map/conduit
time 34.29 ms (31.55 ms .. 36.86 ms)
0.983 R² (0.967 R² .. 0.997 R²)
mean 34.71 ms (33.57 ms .. 36.70 ms)
std dev 3.081 ms (1.613 ms .. 4.434 ms)
variance introduced by outliers: 36% (moderately inflated)
benchmarking drop/machines
time 18.94 ms (17.78 ms .. 19.92 ms)
0.989 R² (0.981 R² .. 0.996 R²)
mean 18.20 ms (17.80 ms .. 18.74 ms)
std dev 1.125 ms (884.5 μs .. 1.556 ms)
variance introduced by outliers: 26% (moderately inflated)
benchmarking drop/pipes
time 36.31 ms (32.06 ms .. 41.25 ms)
0.960 R² (0.929 R² .. 0.987 R²)
mean 32.13 ms (30.53 ms .. 34.22 ms)
std dev 3.665 ms (2.634 ms .. 4.649 ms)
variance introduced by outliers: 48% (moderately inflated)
benchmarking drop/conduit
time 24.17 ms (22.60 ms .. 25.37 ms)
0.981 R² (0.963 R² .. 0.992 R²)
mean 23.37 ms (22.43 ms .. 24.37 ms)
std dev 2.212 ms (1.814 ms .. 2.675 ms)
variance introduced by outliers: 43% (moderately inflated)
benchmarking dropWhile/machines
time 29.05 ms (27.73 ms .. 30.38 ms)
0.981 R² (0.951 R² .. 0.996 R²)
mean 28.85 ms (27.98 ms .. 30.40 ms)
std dev 2.352 ms (1.455 ms .. 3.745 ms)
variance introduced by outliers: 32% (moderately inflated)
benchmarking dropWhile/pipes
time 20.01 ms (19.11 ms .. 21.07 ms)
0.989 R² (0.976 R² .. 0.997 R²)
mean 19.77 ms (19.41 ms .. 20.27 ms)
std dev 968.1 μs (675.3 μs .. 1.475 ms)
variance introduced by outliers: 18% (moderately inflated)
benchmarking dropWhile/conduit
time 24.72 ms (22.80 ms .. 26.46 ms)
0.980 R² (0.966 R² .. 0.991 R²)
mean 22.99 ms (22.30 ms .. 24.00 ms)
std dev 1.965 ms (1.494 ms .. 2.571 ms)
variance introduced by outliers: 38% (moderately inflated)
benchmarking scan/machines
time 74.13 ms (71.46 ms .. 77.44 ms)
0.997 R² (0.992 R² .. 1.000 R²)
mean 74.08 ms (72.95 ms .. 75.43 ms)
std dev 2.054 ms (1.385 ms .. 2.915 ms)
benchmarking scan/pipes
time 40.50 ms (36.57 ms .. 44.51 ms)
0.970 R² (0.938 R² .. 0.991 R²)
mean 43.68 ms (41.60 ms .. 46.81 ms)
std dev 4.896 ms (3.429 ms .. 7.539 ms)
variance introduced by outliers: 41% (moderately inflated)
benchmarking scan/conduit
time 71.50 ms (61.99 ms .. 80.79 ms)
0.964 R² (0.923 R² .. 0.989 R²)
mean 58.38 ms (53.86 ms .. 63.93 ms)
std dev 8.760 ms (6.412 ms .. 11.70 ms)
variance introduced by outliers: 56% (severely inflated)
benchmarking take/machines
time 59.00 ms (54.34 ms .. 61.80 ms)
0.990 R² (0.973 R² .. 0.999 R²)
mean 59.19 ms (57.15 ms .. 60.53 ms)
std dev 2.946 ms (1.995 ms .. 4.333 ms)
variance introduced by outliers: 15% (moderately inflated)
benchmarking take/pipes
time 77.76 ms (71.07 ms .. 86.97 ms)
0.976 R² (0.947 R² .. 0.994 R²)
mean 65.20 ms (60.54 ms .. 70.45 ms)
std dev 8.530 ms (6.583 ms .. 10.47 ms)
variance introduced by outliers: 44% (moderately inflated)
benchmarking take/conduit
time 47.82 ms (38.19 ms .. 59.14 ms)
0.894 R² (0.805 R² .. 0.985 R²)
mean 44.69 ms (41.80 ms .. 48.89 ms)
std dev 6.650 ms (4.951 ms .. 8.876 ms)
variance introduced by outliers: 55% (severely inflated)
benchmarking takeWhile/machines
time 44.80 ms (41.23 ms .. 49.81 ms)
0.976 R² (0.946 R² .. 0.994 R²)
mean 50.66 ms (47.99 ms .. 53.64 ms)
std dev 5.032 ms (3.873 ms .. 6.179 ms)
variance introduced by outliers: 37% (moderately inflated)
benchmarking takeWhile/pipes
time 46.95 ms (41.92 ms .. 51.76 ms)
0.967 R² (0.907 R² .. 0.993 R²)
mean 49.77 ms (46.53 ms .. 52.10 ms)
std dev 4.878 ms (3.307 ms .. 6.403 ms)
variance introduced by outliers: 37% (moderately inflated)
benchmarking takeWhile/conduit
time 45.86 ms (39.98 ms .. 53.58 ms)
0.928 R² (0.846 R² .. 0.994 R²)
mean 46.01 ms (43.26 ms .. 50.88 ms)
std dev 6.881 ms (2.061 ms .. 9.371 ms)
variance introduced by outliers: 56% (severely inflated)
benchmarking fold/machines
time 31.03 ms (27.78 ms .. 33.85 ms)
0.970 R² (0.943 R² .. 0.991 R²)
mean 32.23 ms (30.79 ms .. 34.18 ms)
std dev 3.447 ms (2.704 ms .. 4.350 ms)
variance introduced by outliers: 45% (moderately inflated)
benchmarking fold/pipes
time 13.74 ms (13.58 ms .. 13.99 ms)
0.997 R² (0.993 R² .. 0.999 R²)
mean 13.61 ms (13.45 ms .. 13.78 ms)
std dev 408.4 μs (292.7 μs .. 628.8 μs)
variance introduced by outliers: 11% (moderately inflated)
benchmarking fold/conduit
time 22.43 ms (20.49 ms .. 24.79 ms)
0.968 R² (0.941 R² .. 0.996 R²)
mean 21.04 ms (20.50 ms .. 21.98 ms)
std dev 1.671 ms (915.8 μs .. 2.521 ms)
variance introduced by outliers: 36% (moderately inflated)
benchmarking filter/machines
time 42.06 ms (37.85 ms .. 47.26 ms)
0.966 R² (0.943 R² .. 0.999 R²)
mean 38.52 ms (37.12 ms .. 41.29 ms)
std dev 3.647 ms (2.126 ms .. 4.760 ms)
variance introduced by outliers: 38% (moderately inflated)
benchmarking filter/pipes
time 76.32 ms (72.25 ms .. 88.09 ms)
0.966 R² (0.892 R² .. 0.999 R²)
mean 78.47 ms (76.02 ms .. 84.95 ms)
std dev 6.521 ms (1.150 ms .. 10.42 ms)
variance introduced by outliers: 28% (moderately inflated)
benchmarking filter/conduit
time 36.45 ms (33.16 ms .. 38.64 ms)
0.970 R² (0.926 R² .. 0.990 R²)
mean 35.74 ms (34.17 ms .. 38.13 ms)
std dev 3.921 ms (2.581 ms .. 5.930 ms)
variance introduced by outliers: 42% (moderately inflated)
benchmarking mapM/machines
time 26.56 ms (23.29 ms .. 29.10 ms)
0.958 R² (0.922 R² .. 0.982 R²)
mean 30.99 ms (29.27 ms .. 33.09 ms)
std dev 4.209 ms (3.505 ms .. 4.896 ms)
variance introduced by outliers: 57% (severely inflated)
benchmarking mapM/pipes
time 186.9 ms (177.8 ms .. 197.0 ms)
0.999 R² (0.997 R² .. 1.000 R²)
mean 190.2 ms (186.9 ms .. 195.8 ms)
std dev 5.796 ms (1.591 ms .. 7.869 ms)
variance introduced by outliers: 14% (moderately inflated)
benchmarking mapM/conduit
time 48.65 ms (44.72 ms .. 52.36 ms)
0.979 R² (0.946 R² .. 0.996 R²)
mean 54.95 ms (52.44 ms .. 58.57 ms)
std dev 5.599 ms (4.257 ms .. 7.520 ms)
variance introduced by outliers: 37% (moderately inflated)
benchmarking plan-test-machine/final
time 21.36 ms (19.91 ms .. 22.68 ms)
0.987 R² (0.978 R² .. 0.996 R²)
mean 22.62 ms (21.87 ms .. 23.71 ms)
std dev 2.100 ms (1.507 ms .. 2.898 ms)
variance introduced by outliers: 39% (moderately inflated)
benchmarking plan-test-machine/finalOr
time 14.00 ms (13.24 ms .. 14.79 ms)
0.978 R² (0.959 R² .. 0.993 R²)
mean 15.15 ms (14.51 ms .. 16.14 ms)
std dev 1.818 ms (1.226 ms .. 2.240 ms)
variance introduced by outliers: 58% (severely inflated)
benchmarking plan-test-machine/buffered
time 27.44 ms (26.17 ms .. 28.90 ms)
0.993 R² (0.987 R² .. 0.998 R²)
mean 26.91 ms (26.41 ms .. 27.78 ms)
std dev 1.392 ms (836.7 μs .. 2.177 ms)
variance introduced by outliers: 16% (moderately inflated)
Benchmark benchmarks: FINISH
Yep, I tried that as well.
It does look like there's a 2x speed up on most benchmarks just by inlining. But there's an additional 2x speed up on fold
, scan
and a few others, so it might be worth doing both sets of changes.
In that case, does it make sense to apply the transformation to just fold
, scan
and other functions that will actually benefit from it?
I've a branch https://github.com/HuwCampbell/machines/tree/topic/comparisons in which all can be benchmarked and compared. There's quite a few which benefit from explicit rewrites.
I don't really mind if they're all fast.
I'm pretty happy with these changes. Performance improvements are worthwhile, and there are a few other things like tests and better benchmarks.
Review?
Ok
Thanks for your work !
This one might be a little controversial, but I did some experiments rewriting
Process
andSource
Machines
explicitly instead of using aPlan
and got performance improvements essentially across the board. I also believe there were some bugs inbuffered
, as there were situations where it would lead to empty list outputs.This does make these machines less readable and harder to fathom however.
Here's a handful of before and afters
Edit: Further improvements below with inlining.