Open harendra-kumar opened 4 years ago
These are some significant degradation and improvements in run time perf of the Prelude.Serial
benchmark suite when TypeFamilies
extension is enabled by default for everything. Several other benchmarks suites have similar impact. +x in the second column means x percent degradation and -x means x percent improvement (on the lower base value):
Prelude.Serial(cpuTime)
Benchmark default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------ -------------- --------------------------
Prelude.Serial/o-1-space/pipesX4/tee 1993.13 +1209.05
Prelude.Serial/o-1-space/concat-foldable/foldMapWith (<>) (Stream) 1543.25 +140.35
Prelude.Serial/o-1-space/elimination/product 42.85 +120.41
Prelude.Serial/o-1-space/mixed/sum-product-scan 49.43 +90.77
Prelude.Serial/o-1-space/elimination/reduce/IO/foldlM' 39.84 +58.49
Prelude.Serial/o-n-space/foldl/maximumBy 13247.20 +31.22
Prelude.Serial/o-1-space/filtering/drop-one 43.14 +31.19
Prelude.Serial/o-n-space/foldl/minimumBy 13175.00 +30.41
Prelude.Serial/o-1-space/elimination/reduce/IO/foldl' 40.70 +22.61
Prelude.Serial/o-1-space/generation/IsString.fromString 482.02 +21.07
Prelude.Serial/o-1-space/elimination/length . generally 41.53 +20.92
Prelude.Serial/o-1-space/generation/integerFromStep 644.48 +20.70
Prelude.Serial/o-1-space/elimination/length 42.17 +19.04
Prelude.Serial/o-1-space/foldable/maximum 71.68 +16.91
Prelude.Serial/o-1-space/mixed/foldl-map 41.20 +16.09
Prelude.Serial/o-1-space/elimination/sum 43.31 +14.92
Prelude.Serial/o-1-space/concat/concatMapWith (n of 1) 14155.00 +14.19
Prelude.Serial/o-1-space/concat/concatMapRepl (sqrt n of sqrt n) 652.58 -9.24
Prelude.Serial/o-1-space/concat-foldable/foldMapM (List) 5984.40 -9.43
Prelude.Serial/o-1-space/Monad/(>>) (sqrt n x sqrt n) 2979.64 -9.91
Prelude.Serial/o-1-space/concat/concatMap (sqrt n of sqrt n) 642.06 -10.10
Prelude.Serial/o-1-space/concat/concatMap (1 of n) 1268.91 -10.97
Prelude.Serial/o-1-space/concat/concatMapId (n of 1) (fromFoldable) 5139.50 -11.22
Prelude.Serial/o-1-space/Applicative/(<*) (sqrt n x sqrt n) 2989.50 -12.07
Prelude.Serial/o-1-space/foldable/sum 83.35 -13.79
Prelude.Serial/o-1-space/filtering/dropWhile-true 50.27 -18.96
Prelude.Serial/o-1-space/foldable/sequence_ 56.13 -28.92
Prelude.Serial/o-1-space/filtering/intersperse 56.55 -35.41
Raised a GHC ticket here: https://gitlab.haskell.org/ghc/ghc/-/issues/18414 .
Comparison with MonoLocalBinds
disabled and enabled:
Prelude.Serial(time)
Benchmark default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------ -------------- --------------------------
Prelude.Serial/o-1-space/pipesX4/tee 1909.86 +1295.31
Prelude.Serial/o-1-space/concat-foldable/foldMapWith (<>) (Stream) 1492.10 +141.99
Prelude.Serial/o-1-space/elimination/product 39.22 +137.14
Prelude.Serial/o-1-space/mixed/sum-product-scan 46.27 +97.93
Prelude.Serial/o-1-space/elimination/reduce/IO/foldlM' 38.73 +57.75
Prelude.Serial/o-n-space/foldl/minimumBy 12585.61 +35.11
Prelude.Serial/o-n-space/foldl/maximumBy 12850.76 +33.43
Prelude.Serial/o-1-space/elimination/length . generally 38.87 +22.80
Prelude.Serial/o-1-space/elimination/length 39.67 +22.17
Prelude.Serial/o-1-space/elimination/reduce/IO/foldl' 39.22 +21.06
Prelude.Serial/o-1-space/filtering/filter-all-in 41.25 +19.96
Prelude.Serial/o-1-space/mixed/foldl-map 38.94 +19.72
Prelude.Serial/o-n-stack/iterated/nullHeadTail 3416.58 +17.99
Prelude.Serial/o-n-heap/buffered/reverse' 130.26 +14.01
Prelude.Serial/o-1-space/elimination/sum 41.77 +12.35
Prelude.Serial/o-1-space/Applicative/liftA2 (sqrt n x sqrt n) 2708.38 +11.21
Prelude.Serial/o-1-space/filteringX4/mapMaybe 39.05 +10.14
Results are pretty similar, which indicates that the issue is due to MonoLocalBinds
enabled by TypeFamilies
.
Ok, with TypeFamilies
and NoMonoLocalBinds
the regression is fully recovered:
Benchmark default(0)(μs) default(1) - default(0)(%) default(2) - default(0)(%)
------------------------------------------------------------------------ -------------- -------------------------- --------------------------
Prelude.Serial/o-1-space/pipesX4/tee 1910.64 +1293.17 -1.72
Prelude.Serial/o-1-space/concat-foldable/foldMapWith (<>) (Stream) 1490.11 +142.23 -5.27
Prelude.Serial/o-1-space/elimination/product 39.20 +137.39 +0.13
Prelude.Serial/o-1-space/mixed/sum-product-scan 46.28 +97.74 -1.99
Prelude.Serial/o-1-space/elimination/reduce/IO/foldlM' 38.80 +57.45 -2.95
Prelude.Serial/o-n-space/foldl/minimumBy 12575.83 +35.24 +0.16
Prelude.Serial/o-n-space/foldl/maximumBy 12837.00 +33.47 -3.87
Prelude.Serial/o-1-space/elimination/length . generally 38.93 +22.63 -1.32
Prelude.Serial/o-1-space/elimination/length 39.80 +21.75 -0.64
Prelude.Serial/o-1-space/elimination/reduce/IO/foldl' 39.25 +21.02 -3.86
Prelude.Serial/o-1-space/filtering/filter-all-in 41.26 +19.96 -1.13
Prelude.Serial/o-1-space/mixed/foldl-map 38.94 +19.92 -0.04
Prelude.Serial/o-n-stack/iterated/nullHeadTail 3416.00 +17.97 -1.87
Prelude.Serial/o-n-heap/buffered/reverse' 130.00 +14.13 +2.55
Prelude.Serial/o-1-space/elimination/sum 41.82 +12.13 -1.69
Prelude.Serial/o-1-space/Applicative/liftA2 (sqrt n x sqrt n) 2704.15 +11.32 -1.20
Prelude.Serial/o-1-space/filteringX4/mapMaybe 39.01 +10.13 -2.87
PR #566 proposes enabling most extensions by default. During the investigation we found that enabling
TypeFamilies
extension causes several performance regressions as well as improvements. We would like to keep the improvements but avoid the regressions. We need to investigate why the degradation and improvements occur? And if there are already regressions due to the cases where this extension is currently being used?