Open OlivierSohn opened 6 years ago
This looks quite well. Thanks for the work! I wonder if it won't be a burden maintaining two versions of the same code.
I would say the tests are one of the most fundamental parts to implement before merging. Benchmarks would be a nice addition too, especially if we could compare it to the boxed version. That would give this variant a good reason to exist.
I'd say don't worry about all the multiplication algorithms. If the benchmarks say this version multiplies faster (and it probably will), I'm happy to accept it like that. We can always make progress to that particular purpose later if we want to.
About the rules... They can probably be left there. I don't see any harm besides the warnings. They say the rules might not fire, but when they do it's a performance win. Although ideally we should try to fix #37. But I don't think that stops this PR from merging.
@Daniel-Diaz after reading your comment, I put the rules back in place, added benchmarks for the unboxed version: overall, unboxed multiplications are 2-4x faster!
In the process I also added some missing (imho) SPECIALIZE pragmas to the boxed version, for strassen multiplication, which improved the performances there by approx. 10%.
Commit and detailed report will follow...
(for an updated benchmark, see below. This one is kept here to see the Strassen and StrassenU times which lead to opening #57, and which are not in the updated benchmark)
Running 1 benchmarks...
Benchmark matrix-mult: RUNNING...
benchmarking mult10/Definition
time 8.683 μs (8.449 μs .. 8.937 μs)
0.991 R² (0.988 R² .. 0.994 R²)
mean 9.054 μs (8.753 μs .. 9.482 μs)
std dev 1.272 μs (877.6 ns .. 1.874 μs)
variance introduced by outliers: 93% (severely inflated)
benchmarking mult10/DefinitionU
time 3.947 μs (3.840 μs .. 4.086 μs)
0.991 R² (0.985 R² .. 0.996 R²)
mean 3.982 μs (3.869 μs .. 4.115 μs)
std dev 418.5 ns (351.6 ns .. 529.4 ns)
variance introduced by outliers: 88% (severely inflated)
benchmarking mult10/Definition 2
time 12.74 μs (12.34 μs .. 13.15 μs)
0.990 R² (0.982 R² .. 0.995 R²)
mean 12.56 μs (12.22 μs .. 13.17 μs)
std dev 1.459 μs (1.018 μs .. 2.388 μs)
variance introduced by outliers: 89% (severely inflated)
benchmarking mult10/Strassen
time 4.471 ms (4.313 ms .. 4.628 ms)
0.985 R² (0.975 R² .. 0.992 R²)
mean 4.639 ms (4.468 ms .. 4.892 ms)
std dev 659.6 μs (469.2 μs .. 1.011 ms)
variance introduced by outliers: 78% (severely inflated)
benchmarking mult10/StrassenU
time 2.191 ms (2.096 ms .. 2.284 ms)
0.980 R² (0.969 R² .. 0.990 R²)
mean 2.199 ms (2.108 ms .. 2.340 ms)
std dev 362.0 μs (224.6 μs .. 525.8 μs)
variance introduced by outliers: 87% (severely inflated)
benchmarking mult10/Strassen mixed
time 13.60 μs (13.09 μs .. 14.09 μs)
0.987 R² (0.980 R² .. 0.992 R²)
mean 13.69 μs (13.20 μs .. 14.28 μs)
std dev 1.764 μs (1.407 μs .. 2.180 μs)
variance introduced by outliers: 91% (severely inflated)
benchmarking mult25/Definition
time 127.4 μs (122.2 μs .. 133.1 μs)
0.985 R² (0.978 R² .. 0.992 R²)
mean 133.4 μs (128.5 μs .. 138.3 μs)
std dev 15.91 μs (12.88 μs .. 21.34 μs)
variance introduced by outliers: 86% (severely inflated)
benchmarking mult25/DefinitionU
time 51.29 μs (49.26 μs .. 53.62 μs)
0.980 R² (0.971 R² .. 0.988 R²)
mean 53.78 μs (51.62 μs .. 56.32 μs)
std dev 7.926 μs (6.468 μs .. 11.38 μs)
variance introduced by outliers: 92% (severely inflated)
benchmarking mult25/Definition 2
time 156.4 μs (149.0 μs .. 162.4 μs)
0.977 R² (0.964 R² .. 0.987 R²)
mean 165.9 μs (157.7 μs .. 177.2 μs)
std dev 34.07 μs (26.76 μs .. 46.15 μs)
variance introduced by outliers: 95% (severely inflated)
benchmarking mult25/Strassen
time 38.92 ms (36.21 ms .. 41.42 ms)
0.982 R² (0.958 R² .. 0.996 R²)
mean 42.10 ms (40.56 ms .. 44.16 ms)
std dev 3.348 ms (2.540 ms .. 4.359 ms)
variance introduced by outliers: 25% (moderately inflated)
benchmarking mult25/StrassenU
time 17.05 ms (15.92 ms .. 18.17 ms)
0.982 R² (0.968 R² .. 0.992 R²)
mean 15.94 ms (15.56 ms .. 16.48 ms)
std dev 1.133 ms (903.2 μs .. 1.396 ms)
variance introduced by outliers: 32% (moderately inflated)
benchmarking mult25/Strassen mixed
time 167.9 μs (161.7 μs .. 173.8 μs)
0.986 R² (0.980 R² .. 0.991 R²)
mean 172.6 μs (166.6 μs .. 178.9 μs)
std dev 20.58 μs (17.22 μs .. 25.33 μs)
variance introduced by outliers: 85% (severely inflated)
benchmarking mult100/Definition
time 10.46 ms (9.535 ms .. 11.20 ms)
0.963 R² (0.931 R² .. 0.986 R²)
mean 9.617 ms (9.218 ms .. 10.01 ms)
std dev 1.033 ms (860.2 μs .. 1.282 ms)
variance introduced by outliers: 58% (severely inflated)
benchmarking mult100/DefinitionU
time 2.827 ms (2.743 ms .. 2.935 ms)
0.987 R² (0.979 R² .. 0.993 R²)
mean 2.923 ms (2.816 ms .. 3.099 ms)
std dev 456.4 μs (261.6 μs .. 689.0 μs)
variance introduced by outliers: 83% (severely inflated)
benchmarking mult100/Definition 2
time 8.699 ms (8.246 ms .. 8.999 ms)
0.985 R² (0.964 R² .. 0.996 R²)
mean 8.137 ms (7.863 ms .. 8.360 ms)
std dev 657.7 μs (513.9 μs .. 863.9 μs)
variance introduced by outliers: 47% (moderately inflated)
benchmarking mult100/Strassen
time 2.404 s (NaN s .. 2.680 s)
0.999 R² (0.995 R² .. 1.000 R²)
mean 2.412 s (2.365 s .. 2.447 s)
std dev 53.23 ms (0.0 s .. 60.55 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mult100/StrassenU
time 841.0 ms (805.9 ms .. 913.9 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 864.6 ms (848.2 ms .. 877.2 ms)
std dev 19.47 ms (0.0 s .. 21.87 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mult100/Strassen mixed
time 8.105 ms (7.877 ms .. 8.367 ms)
0.990 R² (0.981 R² .. 0.995 R²)
mean 8.220 ms (7.994 ms .. 8.533 ms)
std dev 689.3 μs (507.4 μs .. 1.082 ms)
variance introduced by outliers: 47% (moderately inflated)
benchmarking mult150/Definition
time 36.63 ms (33.81 ms .. 39.36 ms)
0.981 R² (0.961 R² .. 0.993 R²)
mean 34.36 ms (32.88 ms .. 35.85 ms)
std dev 3.088 ms (2.376 ms .. 4.520 ms)
variance introduced by outliers: 36% (moderately inflated)
benchmarking mult150/DefinitionU
time 9.663 ms (8.957 ms .. 10.19 ms)
0.975 R² (0.958 R² .. 0.986 R²)
mean 9.124 ms (8.828 ms .. 9.446 ms)
std dev 883.0 μs (737.9 μs .. 1.120 ms)
variance introduced by outliers: 53% (severely inflated)
benchmarking mult150/Definition 2
time 28.68 ms (27.30 ms .. 30.55 ms)
0.986 R² (0.974 R² .. 0.995 R²)
mean 26.10 ms (24.94 ms .. 27.15 ms)
std dev 2.355 ms (1.818 ms .. 3.080 ms)
variance introduced by outliers: 37% (moderately inflated)
benchmarking mult150/Strassen
time 17.11 s (14.83 s .. 20.24 s)
0.995 R² (0.994 R² .. 1.000 R²)
mean 16.72 s (16.38 s .. 17.05 s)
std dev 562.1 ms (0.0 s .. 572.2 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mult150/StrassenU
time 5.961 s (5.796 s .. 6.129 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 5.668 s (5.575 s .. 5.710 s)
std dev 81.23 ms (0.0 s .. 87.14 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mult150/Strassen mixed
time 22.15 ms (20.66 ms .. 23.80 ms)
0.978 R² (0.957 R² .. 0.993 R²)
mean 23.31 ms (22.60 ms .. 24.19 ms)
std dev 1.804 ms (1.357 ms .. 2.609 ms)
variance introduced by outliers: 33% (moderately inflated)
benchmarking mult250/Definition
time 177.4 ms (161.0 ms .. 203.1 ms)
0.975 R² (0.898 R² .. 0.999 R²)
mean 168.7 ms (155.7 ms .. 182.3 ms)
std dev 19.32 ms (11.88 ms .. 27.36 ms)
variance introduced by outliers: 27% (moderately inflated)
benchmarking mult250/DefinitionU
time 45.56 ms (42.18 ms .. 49.68 ms)
0.986 R² (0.970 R² .. 0.998 R²)
mean 44.99 ms (43.82 ms .. 46.53 ms)
std dev 2.580 ms (1.816 ms .. 3.460 ms)
variance introduced by outliers: 20% (moderately inflated)
benchmarking mult250/Definition 2
time 107.3 ms (96.78 ms .. 117.7 ms)
0.990 R² (0.978 R² .. 0.998 R²)
mean 104.5 ms (100.1 ms .. 110.3 ms)
std dev 7.664 ms (4.986 ms .. 10.91 ms)
variance introduced by outliers: 21% (moderately inflated)
benchmarking mult250/Strassen
time 16.70 s (16.09 s .. 18.08 s)
0.999 R² (0.998 R² .. 1.000 R²)
mean 16.81 s (16.60 s .. 16.93 s)
std dev 189.5 ms (0.0 s .. 213.3 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mult250/StrassenU
Progress: 1/2
time 5.069 s (4.858 s .. 5.374 s)
1.000 R² (0.999 R² .. 1.000 R²)
mean 5.420 s (5.337 s .. 5.577 s)
std dev 135.8 ms (1.088 fs .. 138.4 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mult250/Strassen mixed
time 112.5 ms (87.89 ms .. 135.4 ms)
0.939 R² (0.889 R² .. 0.999 R²)
mean 104.6 ms (97.39 ms .. 115.9 ms)
std dev 13.34 ms (3.166 ms .. 16.50 ms)
variance introduced by outliers: 42% (moderately inflated)
benchmarking mult400/Definition
time 564.8 ms (368.5 ms .. 925.5 ms)
0.956 R² (0.915 R² .. 1.000 R²)
mean 975.0 ms (885.5 ms .. 1.061 s)
std dev 146.4 ms (0.0 s .. 149.2 ms)
variance introduced by outliers: 46% (moderately inflated)
benchmarking mult400/DefinitionU
time 182.0 ms (179.0 ms .. 184.5 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 181.5 ms (180.5 ms .. 182.6 ms)
std dev 1.248 ms (815.8 μs .. 1.605 ms)
variance introduced by outliers: 14% (moderately inflated)
benchmarking mult400/Definition 2
time 392.5 ms (120.9 ms .. 647.1 ms)
0.917 R² (0.873 R² .. 1.000 R²)
mean 594.6 ms (561.4 ms .. 620.6 ms)
std dev 40.37 ms (0.0 s .. 45.11 ms)
variance introduced by outliers: 20% (moderately inflated)
(Edited)
Reflecting on these results, I wonder if strassen multiplication is of any use to users of the library, because it seems so slow compared to others, and blows up the memory when using a square matrix of length 512 or so (4GB during the benchmark, note that this may also be due to criterion running tests in parallel, but still...).
strassen /mixed/ multiplication seems ok though.
Updated benchmarks with new multiplication functions (now boxed and unboxed versions have the same functionalities), and where the strassen benchmark was removed (cf #57):
Benchmark matrix-mult: RUNNING...
benchmarking mult10/Definition
time 10.35 μs (9.965 μs .. 10.71 μs)
0.990 R² (0.986 R² .. 0.994 R²)
mean 10.50 μs (10.19 μs .. 10.84 μs)
std dev 1.102 μs (883.5 ns .. 1.458 μs)
variance introduced by outliers: 87% (severely inflated)
benchmarking mult10/Definition U
time 4.152 μs (3.988 μs .. 4.304 μs)
0.988 R² (0.983 R² .. 0.992 R²)
mean 4.317 μs (4.160 μs .. 4.473 μs)
std dev 542.1 ns (463.5 ns .. 688.0 ns)
variance introduced by outliers: 92% (severely inflated)
benchmarking mult10/Definition 2
time 12.60 μs (12.20 μs .. 13.02 μs)
0.992 R² (0.988 R² .. 0.996 R²)
mean 12.45 μs (12.21 μs .. 12.80 μs)
std dev 959.7 ns (745.5 ns .. 1.419 μs)
variance introduced by outliers: 78% (severely inflated)
benchmarking mult10/Definition 2 U
time 6.202 μs (5.955 μs .. 6.439 μs)
0.992 R² (0.989 R² .. 0.996 R²)
mean 6.089 μs (5.940 μs .. 6.252 μs)
std dev 509.1 ns (416.7 ns .. 627.7 ns)
variance introduced by outliers: 82% (severely inflated)
benchmarking mult10/Strassen mixed
time 11.43 μs (11.21 μs .. 11.63 μs)
0.996 R² (0.995 R² .. 0.998 R²)
mean 11.34 μs (11.10 μs .. 11.62 μs)
std dev 853.0 ns (713.8 ns .. 1.166 μs)
variance introduced by outliers: 77% (severely inflated)
benchmarking mult10/Strassen mixed U
time 5.617 μs (5.447 μs .. 5.783 μs)
0.994 R² (0.991 R² .. 0.997 R²)
mean 5.541 μs (5.394 μs .. 5.689 μs)
std dev 499.4 ns (422.3 ns .. 623.6 ns)
variance introduced by outliers: 84% (severely inflated)
benchmarking mult25/Definition
time 119.6 μs (115.8 μs .. 123.6 μs)
0.991 R² (0.984 R² .. 0.996 R²)
mean 118.9 μs (115.8 μs .. 122.3 μs)
std dev 10.52 μs (8.329 μs .. 13.77 μs)
variance introduced by outliers: 77% (severely inflated)
benchmarking mult25/Definition U
time 42.60 μs (41.46 μs .. 43.90 μs)
0.993 R² (0.988 R² .. 0.996 R²)
mean 43.79 μs (42.73 μs .. 44.87 μs)
std dev 3.820 μs (2.998 μs .. 4.977 μs)
variance introduced by outliers: 80% (severely inflated)
benchmarking mult25/Definition 2
time 138.6 μs (135.4 μs .. 142.2 μs)
0.991 R² (0.984 R² .. 0.995 R²)
mean 141.2 μs (137.0 μs .. 148.8 μs)
std dev 18.66 μs (12.56 μs .. 33.16 μs)
variance introduced by outliers: 88% (severely inflated)
benchmarking mult25/Definition 2 U
time 46.94 μs (45.57 μs .. 48.63 μs)
0.990 R² (0.986 R² .. 0.995 R²)
mean 48.59 μs (46.91 μs .. 50.79 μs)
std dev 6.704 μs (4.839 μs .. 11.58 μs)
variance introduced by outliers: 90% (severely inflated)
benchmarking mult25/Strassen mixed
time 137.3 μs (133.1 μs .. 141.1 μs)
0.993 R² (0.989 R² .. 0.996 R²)
mean 136.4 μs (133.0 μs .. 140.6 μs)
std dev 13.16 μs (10.67 μs .. 16.62 μs)
variance introduced by outliers: 80% (severely inflated)
benchmarking mult25/Strassen mixed U
time 44.84 μs (43.81 μs .. 45.93 μs)
0.992 R² (0.988 R² .. 0.996 R²)
mean 45.21 μs (43.96 μs .. 46.68 μs)
std dev 4.649 μs (3.558 μs .. 6.531 μs)
variance introduced by outliers: 84% (severely inflated)
benchmarking mult100/Definition
time 8.493 ms (8.198 ms .. 8.779 ms)
0.990 R² (0.983 R² .. 0.996 R²)
mean 8.402 ms (8.170 ms .. 8.672 ms)
std dev 713.1 μs (544.1 μs .. 991.9 μs)
variance introduced by outliers: 47% (moderately inflated)
benchmarking mult100/Definition U
time 2.282 ms (2.197 ms .. 2.368 ms)
0.990 R² (0.986 R² .. 0.995 R²)
mean 2.312 ms (2.257 ms .. 2.371 ms)
std dev 188.1 μs (157.0 μs .. 235.7 μs)
variance introduced by outliers: 58% (severely inflated)
benchmarking mult100/Definition 2
time 7.430 ms (7.193 ms .. 7.661 ms)
0.988 R² (0.977 R² .. 0.995 R²)
mean 7.463 ms (7.294 ms .. 7.688 ms)
std dev 533.3 μs (412.2 μs .. 801.6 μs)
variance introduced by outliers: 41% (moderately inflated)
benchmarking mult100/Definition 2 U
time 2.086 ms (2.017 ms .. 2.148 ms)
0.991 R² (0.987 R² .. 0.996 R²)
mean 2.087 ms (2.046 ms .. 2.132 ms)
std dev 151.2 μs (126.0 μs .. 200.8 μs)
variance introduced by outliers: 53% (severely inflated)
benchmarking mult100/Strassen mixed
time 6.802 ms (6.578 ms .. 7.045 ms)
0.988 R² (0.979 R² .. 0.996 R²)
mean 7.103 ms (6.929 ms .. 7.354 ms)
std dev 625.7 μs (416.6 μs .. 973.0 μs)
variance introduced by outliers: 50% (moderately inflated)
benchmarking mult100/Strassen mixed U
time 2.064 ms (1.994 ms .. 2.145 ms)
0.983 R² (0.969 R² .. 0.992 R²)
mean 2.087 ms (2.020 ms .. 2.184 ms)
std dev 282.1 μs (211.7 μs .. 416.4 μs)
variance introduced by outliers: 80% (severely inflated)
benchmarking mult150/Definition
time 28.55 ms (26.45 ms .. 30.11 ms)
0.989 R² (0.979 R² .. 0.997 R²)
mean 30.38 ms (29.14 ms .. 33.59 ms)
std dev 3.900 ms (1.355 ms .. 6.454 ms)
variance introduced by outliers: 51% (severely inflated)
benchmarking mult150/Definition U
time 7.540 ms (7.257 ms .. 7.810 ms)
0.992 R² (0.988 R² .. 0.996 R²)
mean 8.016 ms (7.788 ms .. 8.455 ms)
std dev 890.5 μs (560.7 μs .. 1.521 ms)
variance introduced by outliers: 63% (severely inflated)
benchmarking mult150/Definition 2
time 23.45 ms (22.57 ms .. 24.38 ms)
0.992 R² (0.984 R² .. 0.996 R²)
mean 24.72 ms (23.97 ms .. 25.73 ms)
std dev 2.049 ms (1.462 ms .. 2.810 ms)
variance introduced by outliers: 35% (moderately inflated)
benchmarking mult150/Definition 2 U
time 5.905 ms (5.748 ms .. 6.038 ms)
0.994 R² (0.989 R² .. 0.997 R²)
mean 5.942 ms (5.838 ms .. 6.076 ms)
std dev 359.4 μs (257.5 μs .. 492.4 μs)
variance introduced by outliers: 35% (moderately inflated)
benchmarking mult150/Strassen mixed
time 23.86 ms (22.94 ms .. 25.05 ms)
0.986 R² (0.967 R² .. 0.997 R²)
mean 23.18 ms (22.65 ms .. 24.10 ms)
std dev 1.550 ms (1.083 ms .. 2.494 ms)
variance introduced by outliers: 28% (moderately inflated)
benchmarking mult150/Strassen mixed U
time 5.906 ms (5.661 ms .. 6.171 ms)
0.985 R² (0.972 R² .. 0.994 R²)
mean 5.908 ms (5.789 ms .. 6.080 ms)
std dev 427.5 μs (316.4 μs .. 569.1 μs)
variance introduced by outliers: 43% (moderately inflated)
benchmarking mult250/Definition
time 151.8 ms (136.4 ms .. 166.9 ms)
0.990 R² (0.978 R² .. 0.999 R²)
mean 152.7 ms (148.6 ms .. 157.1 ms)
std dev 5.811 ms (4.437 ms .. 7.218 ms)
variance introduced by outliers: 12% (moderately inflated)
benchmarking mult250/Definition U
time 44.24 ms (42.15 ms .. 46.50 ms)
0.992 R² (0.981 R² .. 0.998 R²)
mean 43.94 ms (42.84 ms .. 45.76 ms)
std dev 2.660 ms (1.545 ms .. 4.232 ms)
variance introduced by outliers: 20% (moderately inflated)
benchmarking mult250/Definition 2
time 97.75 ms (93.72 ms .. 101.0 ms)
0.998 R² (0.994 R² .. 0.999 R²)
mean 109.3 ms (105.1 ms .. 115.4 ms)
std dev 7.608 ms (5.114 ms .. 10.40 ms)
variance introduced by outliers: 20% (moderately inflated)
benchmarking mult250/Definition 2 U
time 25.68 ms (23.91 ms .. 27.46 ms)
0.982 R² (0.970 R² .. 0.993 R²)
mean 25.66 ms (24.81 ms .. 27.00 ms)
std dev 2.271 ms (1.555 ms .. 3.308 ms)
variance introduced by outliers: 35% (moderately inflated)
benchmarking mult250/Strassen mixed
time 94.98 ms (87.75 ms .. 100.5 ms)
0.994 R² (0.987 R² .. 0.999 R²)
mean 101.8 ms (98.55 ms .. 107.2 ms)
std dev 6.323 ms (3.278 ms .. 9.698 ms)
variance introduced by outliers: 20% (moderately inflated)
benchmarking mult250/Strassen mixed U
time 24.21 ms (22.97 ms .. 25.53 ms)
0.992 R² (0.987 R² .. 0.997 R²)
mean 24.62 ms (23.94 ms .. 25.74 ms)
std dev 1.894 ms (1.125 ms .. 3.224 ms)
variance introduced by outliers: 30% (moderately inflated)
benchmarking mult400/Definition
time 669.4 ms (116.9 ms .. NaN s)
0.868 R² (0.734 R² .. 1.000 R²)
mean 1.142 s (924.0 ms .. 1.326 s)
std dev 294.8 ms (136.0 as .. 319.3 ms)
variance introduced by outliers: 72% (severely inflated)
benchmarking mult400/Definition U
time 213.7 ms (202.1 ms .. 227.9 ms)
0.997 R² (0.983 R² .. 1.000 R²)
mean 227.7 ms (221.8 ms .. 238.3 ms)
std dev 9.651 ms (4.562 ms .. 13.13 ms)
variance introduced by outliers: 14% (moderately inflated)
benchmarking mult400/Definition 2
time 444.5 ms (129.8 ms .. 697.6 ms)
0.946 R² (0.809 R² .. 1.000 R²)
mean 629.4 ms (597.4 ms .. 648.9 ms)
std dev 29.66 ms (0.0 s .. 33.78 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mult400/Definition 2 U
time 119.5 ms (113.2 ms .. 128.0 ms)
0.994 R² (0.981 R² .. 1.000 R²)
mean 113.6 ms (106.3 ms .. 117.7 ms)
std dev 7.606 ms (4.117 ms .. 10.95 ms)
variance introduced by outliers: 12% (moderately inflated)
benchmarking mult400/Strassen mixed
time 675.9 ms (591.4 ms .. NaN s)
0.997 R² (0.990 R² .. 1.000 R²)
mean 742.1 ms (712.8 ms .. 767.8 ms)
std dev 41.68 ms (136.0 as .. 44.47 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mult400/Strassen mixed U
time 105.2 ms (99.46 ms .. 109.2 ms)
0.994 R² (0.979 R² .. 0.999 R²)
mean 112.1 ms (107.8 ms .. 118.3 ms)
std dev 8.083 ms (4.399 ms .. 12.80 ms)
variance introduced by outliers: 21% (moderately inflated)
benchmarking mult500/Definition
time 2.379 s (1.673 s .. 3.803 s)
0.956 R² (0.934 R² .. 1.000 R²)
mean 3.081 s (2.667 s .. 3.384 s)
std dev 460.9 ms (0.0 s .. 525.7 ms)
variance introduced by outliers: 46% (moderately inflated)
benchmarking mult500/Definition U
time 408.3 ms (329.1 ms .. 478.7 ms)
0.995 R² (0.983 R² .. 1.000 R²)
mean 408.9 ms (394.0 ms .. 419.7 ms)
std dev 16.37 ms (0.0 s .. 18.73 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mult500/Definition 2
time 1.010 s (2.313 ms .. 1.623 s)
0.878 R² (0.668 R² .. 1.000 R²)
mean 1.302 s (1.264 s .. 1.335 s)
std dev 54.27 ms (0.0 s .. 57.81 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mult500/Definition 2 U
time 204.4 ms (182.8 ms .. 232.4 ms)
0.991 R² (0.984 R² .. 1.000 R²)
mean 199.2 ms (192.2 ms .. 206.5 ms)
std dev 9.200 ms (7.203 ms .. 10.39 ms)
variance introduced by outliers: 14% (moderately inflated)
benchmarking mult500/Strassen mixed
time 1.523 s (1.291 s .. 1.805 s)
0.996 R² (0.986 R² .. 1.000 R²)
mean 1.455 s (1.403 s .. 1.488 s)
std dev 50.40 ms (0.0 s .. 58.07 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mult500/Strassen mixed U
time 203.2 ms (187.2 ms .. 221.0 ms)
0.996 R² (0.990 R² .. 1.000 R²)
mean 187.6 ms (179.6 ms .. 195.5 ms)
std dev 10.78 ms (5.760 ms .. 15.27 ms)
variance introduced by outliers: 14% (moderately inflated)
Benchmark matrix-mult: FINISH
So the unboxed version is at least 2x faster (for smaller matrix sizes) and up-to 8x faster, for bigger matrix sizes, which is what we could reasonably expect.
So I guess now what remains to be done is to add some tests, as I mentionned in the edited first message, we can't use Integer
because it is not unbox-able.
I probably won't have time to work on it in the near future, so contributions on that side are welcome :)
@Daniel-Diaz this PR would actually be very interesting performance-wise, any update? Thanks!
Following up on #54, I implemented an unboxed matrix.
My intent was to have a version with minimal working functionality that would suit my project. Nevertheless, I open this PR in case someone has the same need or wants to continue the work.
What (probably) needs to be done before a merge:
Port multStd__, multStd2, strassenMixed and multStrassenMixed from the boxed implementations (I removed them here as I couldn't find out how to port them).(Done)mapMat
which does what fmap would do.Also, I removed the rules triggering warnings (see #37). They should be reintroduced once #37 is fixed.(see discussion below)Integer
which cannot be unboxed. We would have to useInt
tests.Benchmarking : on my project I saw better performances, but it would be nice to document it with criterion for example.(Done)