composewell / streamly

High performance, concurrent functional programming abstractions
https://streamly.composewell.com
Other
856 stars 64 forks source link

Performance regressions: GHC-9.4 vs GHC-9.6 vs GHC 9.8 #2599

Open harendra-kumar opened 10 months ago

harendra-kumar commented 10 months ago
Data.StreamK(Allocated)
Benchmark                                                                 default(0)(Bytes) default(1) - default(0)(%)
------------------------------------------------------------------------- ----------------- --------------------------
All.Data.StreamK/o-1-space.generation.fromFoldable                               3995120.00                     +79.21

Data.StreamK(cpuTime)
Benchmark                                                                 default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------- -------------- --------------------------
All.Data.StreamK/o-1-space.generation.fromFoldable                                591.86                    +144.84

Data.Unfold(Allocated)
Benchmark                                                                      default(0)(Bytes) default(1) - default(0)(%)
------------------------------------------------------------------------------ ----------------- --------------------------
All.Data.Unfold/o-1-space.exceptions.UF.finally_ (1/10)                            1195415662.00                    +196.38
All.Data.Unfold/o-1-space.exceptions.UF.bracket_ (1/10)                            1195415632.00                    +196.38
All.Data.Unfold/o-1-space.exceptions.UF.onException (1/10)                         1195424627.00                    +196.38
All.Data.Unfold/o-1-space.exceptions.UF.finally (1/10)                             1363146579.00                    +172.33
All.Data.Unfold/o-1-space.exceptions.UF.bracket (1/10)                             1363148539.00                    +172.33
All.Data.Unfold/o-1-space.exceptions.UF.handle (1/10)                              3291426736.00                     +38.22

Data.Unfold(cpuTime)
Benchmark                                                                      default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------------ -------------- --------------------------
All.Data.Unfold/o-1-space.exceptions.UF.finally_ (1/10)                             167175.00                    +184.50
All.Data.Unfold/o-1-space.exceptions.UF.onException (1/10)                          167345.00                    +181.55
All.Data.Unfold/o-1-space.exceptions.UF.finally (1/10)                              184675.00                    +169.90
All.Data.Unfold/o-1-space.exceptions.UF.bracket_ (1/10)                             175258.00                    +165.03
All.Data.Unfold/o-1-space.exceptions.UF.bracket (1/10)                              188931.00                    +156.35

Data.Stream.ConcurrentInterleaved(cpuTime)
Benchmark                                                                                        default(0)(ms) default(1) - default(0)(%)
------------------------------------------------------------------------------------------------ -------------- --------------------------
All.Data.Stream.ConcurrentInterleaved/o-n-heap.buffered.mkAsync                                           54.10                     +42.10
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat.concat . fmap (n of 1)                            149.58                     +39.00
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat.parConcatMap (n of 1)                             143.84                     +37.36
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat.parConcatMap (1 of n)                             131.14                     +21.69
All.Data.Stream.ConcurrentInterleaved/o-1-space.joining.async (2 of n/2)                                  66.68                     +20.18
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat-foldable.foldMapWith (<>) (List)                   49.25                     +19.08
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat.parConcatMap (sqrt x of sqrt x)                    67.51                     +18.28
All.Data.Stream.ConcurrentInterleaved/o-1-space.joining.concat async (2 of n/2)                           68.02                     +18.07
All.Data.Stream.ConcurrentInterleaved/o-n-heap.monad-outer-product.toNullAp                              149.29                     +18.06
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat-foldable.S.concatFoldableWith (<>) (List)          54.60                     +17.95
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat-foldable.foldMapWith (<>) (Stream)                 49.31                     +17.48
All.Data.Stream.ConcurrentInterleaved/o-1-space.mapping.mapM                                              49.27                     +16.62

Data.Stream.ConcurrentInterleaved(Allocated)
Benchmark                                                                                        default(0)(MiB) default(1) - default(0)(%)
------------------------------------------------------------------------------------------------ --------------- --------------------------
All.Data.Stream.ConcurrentInterleaved/o-n-heap.buffered.mkAsync                                            80.09                     +32.02
All.Data.Stream.ConcurrentInterleaved/o-1-space.joining.async (2 of n/2)                                   64.28                     +23.22
All.Data.Stream.ConcurrentInterleaved/o-1-space.joining.concat async (2 of n/2)                            67.80                     +22.27
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat.parConcatMap (sqrt x of sqrt x)                     64.77                     +22.25
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat.parConcatMap (1 of n)                              127.82                     +21.40
All.Data.Stream.ConcurrentInterleaved/o-n-heap.monad-outer-product.toNullAp                               173.82                     +19.55
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat.parConcatMap (n of 1)                              175.16                     +16.98
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat.concat . fmap (n of 1)                             188.47                     +15.31
All.Data.Stream.ConcurrentInterleaved/o-1-space.mapping.mapM                                              102.18                     +14.18
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat-foldable.foldMapWith (<>) (Stream)                 101.83                     +14.05
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat-foldable.foldMapWith (<>) (List)                   106.61                     +13.81
All.Data.Stream.ConcurrentInterleaved/o-1-space.concat-foldable.S.concatFoldableWith (<>) (List)          126.87                     +11.59

Data.Stream(Allocated)
Benchmark                                                                 default(0)(Bytes) default(1) - default(0)(%)
------------------------------------------------------------------------- ----------------- --------------------------

All.Data.Stream/o-1-space.filtering.takeInterval-all                              735319.00                     +80.56
All.Data.Stream/o-1-space.filtering.dropInterval-all                            58615738.00                     +46.44
All.Data.Stream/o-1-space.exceptions/serial.retryNone                           27629489.00                     +44.57
All.Data.Stream/o-1-space.exceptions/serial.retryNoneSimple                     30243928.00                     +39.81
All.Data.Stream/o-1-space.exceptions/serial.retryAll                            36637664.00                     +31.85

All.Data.Stream/o-1-space.grouping.classifySessionsOf (64 buckets)             144277875.00                     +18.64
All.Data.Stream/o-1-space.grouping.classifySessionsOfHash (64 buckets)         155469190.00                     +17.30
All.Data.Stream/o-1-space.grouping.classifySessionsOf (10000 buckets)          195256380.00                     +13.71
All.Data.Stream/o-1-space.grouping.classifySessionsOfHash (10000 buckets)      204476770.00                     +13.11

Data.Stream(cpuTime)
Benchmark                                                                 default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------- -------------- --------------------------
All.Data.Stream/o-1-space.filtering.dropInterval-all                            16123.70                     +32.35
All.Data.Stream/o-1-space.filtering.takeInterval-all                              336.21                     +26.81
All.Data.Stream/o-1-space.grouping.classifySessionsOf (64 buckets)              41858.70                     +20.03
All.Data.Stream/o-1-space.grouping.classifySessionsOfHash (64 buckets)          40419.20                     +19.03
All.Data.Stream/o-1-space.concat.concatMapM (1 of n)                             1267.40                     +17.30
All.Data.Stream/o-1-space.exceptions/serial.retryAll                            14477.80                     +15.97
All.Data.Stream/o-1-space.mixed.foldl-map                                        9972.84                     +15.93
All.Data.Stream/o-1-space.mixed.sum-product-fold                                10235.20                     +14.55
All.Data.Stream/o-1-space.filteringX4.elemIndices                                  65.52                     +14.27
All.Data.Stream/o-1-space.exceptions/serial.retryNone                            9119.35                     +11.27
All.Data.Stream/o-1-space.grouping.classifySessionsOfHash (10000 buckets)      114018.00                     +10.75
All.Data.Stream/o-1-space.concat.concatMapPure (1 of n)                          2027.78                     +10.20
All.Data.Stream/o-1-space.exceptions/serial.retryNoneSimple                      9337.75                     +10.13
All.Data.Stream/o-1-space.concat.concatMap (n of 1)                              2123.44                      +9.15
All.Data.Stream/o-1-space.Monad.(>>=) (sqrt n x sqrt n) (breakAfterSome)         5865.57                      +8.95
All.Data.Stream/o-1-space.mixed.sum-product-scan                                27031.40                      +8.21
All.Data.Stream/o-1-space.grouping.classifySessionsOf (10000 buckets)          110534.00                      +7.59

Data.MutArray(Allocated)
Benchmark                                                   default(0)(Bytes) default(1) - default(0)(%)
----------------------------------------------------------- ----------------- --------------------------
All.Data.MutArray/o-1-space.modifyIndices (+ 1)                          0.00                  +Infinity

Data.MutArray(cpuTime)
Benchmark                                                   default(0)(μs) default(1) - default(0)(%)
----------------------------------------------------------- -------------- --------------------------
All.Data.MutArray/o-1-space.modifyIndices (+ 1)                     155.65                   +3804.78

Data.Fold.Window(Allocated)
Benchmark                                                                 default(0)(Bytes) default(1) - default(0)(%)
------------------------------------------------------------------------- ----------------- --------------------------
All.Data.Fold.Window/o-1-space.fold.range (window size 100)                            0.00                  +Infinity
All.Data.Fold.Window/o-1-space.fold.maximum (window size 100)                          0.00                  +Infinity
All.Data.Fold.Window/o-1-space.fold.minimum (window size 100)                          0.00                  +Infinity
All.Data.Fold.Window/o-1-space.fold.maximum (window size 1000)                    169535.00                    +943.97
All.Data.Fold.Window/o-1-space.fold.minimum (window size 1000)                    169550.00                    +928.04
All.Data.Fold.Window/o-1-space.fold.range (window size 1000)                      200643.00                    +798.24
All.Data.Fold.Window/o-1-space.fold.sum (window size 100)                        1572859.00                    +101.67
All.Data.Fold.Window/o-1-space.fold.sum (window size 1000)                       1580752.00                     +99.50
All.Data.Fold.Window/o-1-space.fold.mean (window size 100)                       1566357.00                    +102.50
All.Data.Fold.Window/o-1-space.fold.mean (window size 1000)                      1574250.00                    +100.33
All.Data.Fold.Window/o-1-space.scan.range (window size 10)                       1572789.00                    +101.67
All.Data.Fold.Window/o-1-space.scan.maximum (window size 10)                     1572789.00                    +101.67
All.Data.Fold.Window/o-1-space.scan.range (window size 30)                       1572476.00                    +100.03
All.Data.Fold.Window/o-1-space.scan.minimum (window size 30)                     1572477.00                    +100.02
All.Data.Fold.Window/o-1-space.scan.maximum (window size 30)                     1572480.00                    +100.02
All.Data.Fold.Window/o-1-space.scan.minimum (window size 10)                     1572815.00                    +100.00
All.Data.Fold.Window/o-1-space.scan.mean (window size 100)                       1572859.00                    +100.00
All.Data.Fold.Window/o-1-space.scan.sum (window size 100)                        1572857.00                    +100.00
All.Data.Fold.Window/o-1-space.scan.sum (window size 1000)                       1580753.00                     +99.50
All.Data.Fold.Window/o-1-space.scan.mean (window size 1000)                      1580753.00                     +99.50

Data.Fold.Window(cpuTime)
Benchmark                                                                 default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------- -------------- --------------------------
All.Data.Fold.Window/o-1-space.fold.maximum (window size 100)                     418.47                     +94.43
All.Data.Fold.Window/o-1-space.fold.minimum (window size 1000)                    459.81                     +89.30
All.Data.Fold.Window/o-1-space.fold.range (window size 1000)                      478.47                     +85.13
All.Data.Fold.Window/o-1-space.fold.range (window size 100)                       420.11                     +78.49
All.Data.Fold.Window/o-1-space.fold.minimum (window size 100)                     418.79                     +76.98
All.Data.Fold.Window/o-1-space.fold.maximum (window size 1000)                    461.87                     +65.78
All.Data.Fold.Window/o-1-space.scan.range (window size 30)                       2735.69                     +63.87
All.Data.Fold.Window/o-1-space.scan.maximum (window size 30)                     2687.39                     +61.08
All.Data.Fold.Window/o-1-space.scan.minimum (window size 10)                     1190.31                     +58.35
All.Data.Fold.Window/o-1-space.scan.minimum (window size 30)                     2800.91                     +55.03
All.Data.Fold.Window/o-1-space.scan.maximum (window size 10)                     1180.51                     +50.40
All.Data.Fold.Window/o-1-space.scan.range (window size 10)                       1182.92                     +47.06
All.Data.Fold.Window/o-1-space.scan.mean (window size 100)                        822.53                     +21.82
All.Data.Fold.Window/o-1-space.scan.sum (window size 100)                         835.82                     +16.74
All.Data.Fold.Window/o-1-space.scan.mean (window size 1000)                       816.80                     +16.72
All.Data.Fold.Window/o-1-space.fold.powerSum 2 (entire stream)                    710.75                     +15.65
All.Data.Fold.Window/o-1-space.fold.sum (window size 1000)                        838.57                     +15.56
All.Data.Fold.Window/o-1-space.scan.sum (window size 1000)                        833.53                     +11.46
All.Data.Fold.Window/o-1-space.fold.sum (window size 100)                         836.98                      +8.61
harendra-kumar commented 10 months ago

Some regressions in 9.6 improved in 9.8 especially ConcurrentInterleaved and classifySessionsOf. We will track only the ones that remain in 9.8. Here is a comparison of GHC-9.4 vs 9.8

Data.Unfold(Allocated)
Benchmark                                                                      default(0)(Bytes) default(1) - default(0)(%)
------------------------------------------------------------------------------ ----------------- --------------------------
All.Data.Unfold/o-1-space.exceptions.UF.bracket_ (1/10)                            1195415632.00                    +154.26
All.Data.Unfold/o-1-space.exceptions.UF.finally_ (1/10)                            1195415662.00                    +154.26
All.Data.Unfold/o-1-space.exceptions.UF.onException (1/10)                         1195424627.00                    +154.26
All.Data.Unfold/o-1-space.exceptions.UF.finally (1/10)                             1363146579.00                    +135.41
All.Data.Unfold/o-1-space.exceptions.UF.bracket (1/10)                             1363148539.00                    +135.41
All.Data.Unfold/o-1-space.exceptions.UF.handle (1/10)                              3291426736.00                     +22.97

Data.Unfold(cpuTime)
Benchmark                                                                      default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------------ -------------- --------------------------
All.Data.Unfold/o-1-space.exceptions.UF.onException (1/10)                          167345.00                    +217.58
All.Data.Unfold/o-1-space.exceptions.UF.finally_ (1/10)                             167175.00                    +206.43
All.Data.Unfold/o-1-space.exceptions.UF.bracket_ (1/10)                             175258.00                    +193.34
All.Data.Unfold/o-1-space.exceptions.UF.bracket (1/10)                              188931.00                    +187.39
All.Data.Unfold/o-1-space.exceptions.UF.finally (1/10)                              184675.00                    +177.77
All.Data.Unfold/o-1-space.filtering.take                                                37.67                     +66.97
All.Data.Unfold/o-1-space.generation.fromStream                                        490.98                     +20.70
All.Data.Unfold/o-1-space.generation.fromStreamK                                      2416.70                     +10.54

Data.StreamK(Allocated)
Benchmark                                                                 default(0)(Bytes) default(1) - default(0)(%)
------------------------------------------------------------------------- ----------------- --------------------------
All.Data.StreamK/o-1-space.generation.fromFoldable                               3995120.00                     +79.21

Data.StreamK(cpuTime)
Benchmark                                                                 default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------- -------------- --------------------------
All.Data.StreamK/o-1-space.generation.fromFoldable                                591.86                    +150.65

Data.Stream(Allocated)
Benchmark                                                                 default(0)(Bytes) default(1) - default(0)(%)
------------------------------------------------------------------------- ----------------- --------------------------
All.Data.Stream/o-1-space.filtering.takeInterval-all                              735319.00                     +67.19
All.Data.Stream/o-1-space.exceptions/serial.retryNone                           27629489.00                     +26.74
All.Data.Stream/o-1-space.exceptions/serial.retryNoneSimple                     30243928.00                     +22.33
All.Data.Stream/o-1-space.exceptions/serial.retryAll                            36637664.00                     +19.45

Data.Stream(cpuTime)
Benchmark                                                                 default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------- -------------- --------------------------
All.Data.Stream/o-1-space.Applicative.(*>) (sqrt n x sqrt n)                       45.16                     +48.92
All.Data.Stream/o-1-space.filtering.takeInterval-all                              336.21                     +40.81
All.Data.Stream/o-1-space.elimination.uncons                                      708.69                     +20.41
All.Data.Stream/o-1-space.mixed.foldl-map                                        9972.84                     +17.60
All.Data.Stream/o-1-space.mixed.sum-product-fold                                10235.20                     +16.31

Data.MutArray(Allocated)
Benchmark                                                   default(0)(Bytes) default(1) - default(0)(%)
----------------------------------------------------------- ----------------- --------------------------
All.Data.MutArray/o-1-space.modifyIndices (+ 1)                          0.00                  +Infinity

Data.MutArray(cpuTime)
Benchmark                                                   default(0)(μs) default(1) - default(0)(%)
----------------------------------------------------------- -------------- --------------------------
All.Data.MutArray/o-1-space.modifyIndices (+ 1)                     155.65                   +3829.09

Data.Fold.Window(Allocated)
Benchmark                                                                 default(0)(Bytes) default(1) - default(0)(%)
------------------------------------------------------------------------- ----------------- --------------------------
All.Data.Fold.Window/o-1-space.fold.range (window size 100)                            0.00                  +Infinity
All.Data.Fold.Window/o-1-space.fold.maximum (window size 100)                          0.00                  +Infinity
All.Data.Fold.Window/o-1-space.fold.minimum (window size 100)                          0.00                  +Infinity
All.Data.Fold.Window/o-1-space.fold.maximum (window size 1000)                    169535.00                    +943.97
All.Data.Fold.Window/o-1-space.fold.minimum (window size 1000)                    169550.00                    +928.04
All.Data.Fold.Window/o-1-space.fold.range (window size 1000)                      200643.00                    +784.54
All.Data.Fold.Window/o-1-space.fold.sum (window size 100)                        1572859.00                    +101.67
All.Data.Fold.Window/o-1-space.fold.sum (window size 1000)                       1580752.00                    +101.17
All.Data.Fold.Window/o-1-space.fold.mean (window size 100)                       1566357.00                    +100.83
All.Data.Fold.Window/o-1-space.fold.mean (window size 1000)                      1574250.00                    +100.33
All.Data.Fold.Window/o-1-space.scan.range (window size 10)                       1572789.00                    +102.92
All.Data.Fold.Window/o-1-space.scan.maximum (window size 10)                     1572789.00                    +102.92
All.Data.Fold.Window/o-1-space.scan.minimum (window size 10)                     1572815.00                    +101.67
All.Data.Fold.Window/o-1-space.scan.range (window size 30)                       1572476.00                    +100.03
All.Data.Fold.Window/o-1-space.scan.minimum (window size 30)                     1572477.00                    +100.03
All.Data.Fold.Window/o-1-space.scan.maximum (window size 30)                     1572480.00                    +100.02
All.Data.Fold.Window/o-1-space.scan.sum (window size 100)                        1572857.00                    +100.00
All.Data.Fold.Window/o-1-space.scan.mean (window size 100)                       1572859.00                    +100.00
All.Data.Fold.Window/o-1-space.scan.sum (window size 1000)                       1580753.00                     +99.50
All.Data.Fold.Window/o-1-space.scan.mean (window size 1000)                      1580753.00                     +99.50

Data.Fold.Window(cpuTime)
Benchmark                                                                 default(0)(μs) default(1) - default(0)(%)
------------------------------------------------------------------------- -------------- --------------------------
All.Data.Fold.Window/o-1-space.fold.maximum (window size 100)                     418.47                     +94.07
All.Data.Fold.Window/o-1-space.fold.minimum (window size 1000)                    459.81                     +89.91
All.Data.Fold.Window/o-1-space.fold.range (window size 1000)                      478.47                     +85.52
All.Data.Fold.Window/o-1-space.fold.range (window size 100)                       420.11                     +77.46
All.Data.Fold.Window/o-1-space.fold.minimum (window size 100)                     418.79                     +76.28
All.Data.Fold.Window/o-1-space.fold.maximum (window size 1000)                    461.87                     +65.85
All.Data.Fold.Window/o-1-space.scan.maximum (window size 30)                     2687.39                     +61.09
All.Data.Fold.Window/o-1-space.scan.range (window size 30)                       2735.69                     +58.38
All.Data.Fold.Window/o-1-space.scan.minimum (window size 30)                     2800.91                     +56.86
All.Data.Fold.Window/o-1-space.scan.maximum (window size 10)                     1180.51                     +52.16
All.Data.Fold.Window/o-1-space.scan.minimum (window size 10)                     1190.31                     +51.42
All.Data.Fold.Window/o-1-space.scan.range (window size 10)                       1182.92                     +47.50
All.Data.Fold.Window/o-1-space.scan.mean (window size 100)                        822.53                     +20.95
All.Data.Fold.Window/o-1-space.scan.mean (window size 1000)                       816.80                     +16.99
All.Data.Fold.Window/o-1-space.scan.sum (window size 100)                         835.82                     +16.79
All.Data.Fold.Window/o-1-space.fold.powerSum 2 (entire stream)                    710.75                     +15.70
All.Data.Fold.Window/o-1-space.fold.sum (window size 1000)                        838.57                     +15.13
All.Data.Fold.Window/o-1-space.scan.sum (window size 1000)                        833.53                     +11.48
All.Data.Fold.Window/o-1-space.fold.sum (window size 100)                         836.98                      +9.00
harendra-kumar commented 10 months ago

Memory requirement GHC-9.4 vs GHC-9.6:

memory requirement for 9.8 is same as 9.6