Gabriella439 / foldl

Composable, streaming, and efficient left folds
BSD 3-Clause "New" or "Revised" License
158 stars 51 forks source link

Severe performance regression in `Control.Foldl.vector` when using `vector-builder` #93

Closed klapaucius closed 6 years ago

klapaucius commented 6 years ago

foldl-1.2.5

module Main where

import qualified Data.Vector.Unboxed as V
import qualified Control.Foldl as L

main :: IO ()
main = do 
    v <- L.foldM L.vector [1..10000000::Int]
    print . V.maximum $ v
10000000
     971,706,320 bytes allocated in the heap
          13,496 bytes copied during GC
      83,887,216 bytes maximum residency (9 sample(s))
       2,071,424 bytes maximum slop
             168 MB total memory in use (5 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1388 colls,     0 par    0.000s   0.002s     0.0000s    0.0000s
  Gen  1         9 colls,     0 par    0.000s   0.016s     0.0018s    0.0147s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.172s  (  0.164s elapsed)
  GC      time    0.000s  (  0.018s elapsed)
  EXIT    time    0.000s  (  0.015s elapsed)
  Total   time    0.172s  (  0.197s elapsed)

  %GC     time       0.0%  (9.2% elapsed)

  Alloc rate    5,653,564,043 bytes per MUT second

  Productivity 100.0% of total user, 90.6% of total elapsed

vector-builder-0.3.1

module Main where

import qualified Data.Vector.Unboxed as V
import qualified Control.Foldl as L
import qualified VectorBuilder.Builder as N
import qualified VectorBuilder.Vector as O

main :: IO ()
main = print . V.maximum $ L.fold (L.foldMap N.singleton O.build) [1..10000000::Int]
10000000
   2,914,242,624 bytes allocated in the heap
   4,551,665,120 bytes copied during GC
     823,919,352 bytes maximum residency (15 sample(s))
      68,425,232 bytes maximum slop
            1697 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5425 colls,     0 par    2.859s   2.754s     0.0005s    0.0012s
  Gen  1        15 colls,     0 par    3.219s   3.557s     0.2371s    0.8895s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.906s  (  0.911s elapsed)
  GC      time    6.078s  (  6.311s elapsed)
  EXIT    time    0.016s  (  0.161s elapsed)
  Total   time    7.000s  (  7.383s elapsed)

  %GC     time      86.8%  (85.5% elapsed)

  Alloc rate    3,215,715,998 bytes per MUT second

  Productivity  13.2% of total user, 14.5% of total elapsed

foldl-1.3.1

module Main where

import qualified Data.Vector.Unboxed as V
import qualified Control.Foldl as L

main :: IO ()
main = print . V.maximum $ L.fold L.vector [1..10000000::Int]
10000000
   2,914,242,624 bytes allocated in the heap
   4,551,665,120 bytes copied during GC
     823,919,352 bytes maximum residency (15 sample(s))
      68,425,232 bytes maximum slop
            1697 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5425 colls,     0 par    2.625s   2.750s     0.0005s    0.0011s
  Gen  1        15 colls,     0 par    3.078s   3.549s     0.2366s    0.8859s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    1.156s  (  0.907s elapsed)
  GC      time    5.703s  (  6.299s elapsed)
  EXIT    time    0.016s  (  0.162s elapsed)
  Total   time    6.875s  (  7.369s elapsed)

  %GC     time      83.0%  (85.5% elapsed)

  Alloc rate    2,520,426,053 bytes per MUT second

  Productivity  17.0% of total user, 14.5% of total elapsed
Gabriella439 commented 6 years ago

cc: @nikita-volkov

nikita-volkov commented 6 years ago

Woops. I'll take a look

nikita-volkov commented 6 years ago

Alright! I've applied some optimizations there, increasing the performance several times. I've also added more benchmarks including the subject case. GC is still hard, but at least the performance is now still better than of the vector-growing fold. That is, even when building the vector of size 10,000,000 (as in the subject).

However, it takes forever to run even larger benchmarks, so I am not certain now that the algorithm will perform better then, which is why I think it'll make sense to restore the older vector fold in this library and have both alternatives coexist. We'll just need to mention the performance implications in the docs.

Here are the benchmark results of "vector-builder" of version 0.3.3:

benchmarking 1000/vector-builder
time                 91.11 μs   (90.75 μs .. 91.51 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 91.54 μs   (91.06 μs .. 92.28 μs)
std dev              1.882 μs   (1.321 μs .. 3.169 μs)
variance introduced by outliers: 16% (moderately inflated)

benchmarking 1000/default
time                 272.8 μs   (271.1 μs .. 275.2 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 273.6 μs   (271.8 μs .. 275.4 μs)
std dev              6.106 μs   (5.212 μs .. 7.302 μs)
variance introduced by outliers: 16% (moderately inflated)

benchmarking 10000/vector-builder
time                 2.008 ms   (1.989 ms .. 2.027 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 2.008 ms   (1.997 ms .. 2.019 ms)
std dev              38.06 μs   (29.94 μs .. 49.11 μs)

benchmarking 10000/default
time                 3.029 ms   (3.008 ms .. 3.045 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 3.012 ms   (2.993 ms .. 3.028 ms)
std dev              60.85 μs   (46.65 μs .. 84.29 μs)

benchmarking 100000/vector-builder
time                 28.51 ms   (27.91 ms .. 28.96 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 28.72 ms   (28.35 ms .. 29.29 ms)
std dev              1.007 ms   (642.3 μs .. 1.607 ms)
variance introduced by outliers: 10% (moderately inflated)

benchmarking 100000/default
time                 32.81 ms   (32.63 ms .. 32.98 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 32.76 ms   (32.67 ms .. 32.87 ms)
std dev              198.4 μs   (153.3 μs .. 257.1 μs)

benchmarking 1000000/vector-builder
time                 299.6 ms   (250.5 ms .. 348.9 ms)
                     0.985 R²   (0.928 R² .. 1.000 R²)
mean                 308.4 ms   (283.7 ms .. 328.2 ms)
std dev              25.19 ms   (12.99 ms .. 34.22 ms)
variance introduced by outliers: 18% (moderately inflated)

benchmarking 1000000/default
time                 332.1 ms   (312.7 ms .. 355.4 ms)
                     0.997 R²   (0.982 R² .. 1.000 R²)
mean                 328.5 ms   (318.7 ms .. 335.9 ms)
std dev              9.820 ms   (5.357 ms .. 13.32 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking 10000000/vector-builder
time                 3.005 s    (2.391 s .. 3.785 s)
                     0.992 R²   (0.975 R² .. NaN R²)
mean                 3.247 s    (3.109 s .. 3.378 s)
std dev              220.4 ms   (543.9 as .. 226.5 ms)
variance introduced by outliers: 20% (moderately inflated)

benchmarking 10000000/default
time                 3.699 s    (3.462 s .. 3.897 s)
                     1.000 R²   (0.998 R² .. 1.000 R²)
mean                 3.594 s    (3.503 s .. 3.641 s)
std dev              78.70 ms   (543.9 as .. 81.27 ms)
variance introduced by outliers: 19% (moderately inflated)
nikita-volkov commented 6 years ago

I've run the benchmarks on 100,000,000 and it's about the point when the vector-growing algo starts beating "vector-builder":

benchmarking 1000/vector-builder
time                 91.29 μs   (90.78 μs .. 91.93 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 91.34 μs   (90.78 μs .. 92.11 μs)
std dev              2.223 μs   (1.440 μs .. 3.850 μs)
variance introduced by outliers: 21% (moderately inflated)

benchmarking 1000/default
time                 284.5 μs   (281.2 μs .. 289.4 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 285.2 μs   (283.8 μs .. 287.3 μs)
std dev              5.460 μs   (4.009 μs .. 8.053 μs)
variance introduced by outliers: 12% (moderately inflated)

benchmarking 10000/vector-builder
time                 2.056 ms   (2.025 ms .. 2.084 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 2.032 ms   (2.019 ms .. 2.045 ms)
std dev              42.98 μs   (35.34 μs .. 56.33 μs)

benchmarking 10000/default
time                 3.068 ms   (3.044 ms .. 3.092 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 3.131 ms   (3.104 ms .. 3.162 ms)
std dev              91.35 μs   (73.99 μs .. 111.9 μs)
variance introduced by outliers: 14% (moderately inflated)

benchmarking 100000/vector-builder
time                 28.88 ms   (28.26 ms .. 29.37 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 28.90 ms   (28.43 ms .. 29.41 ms)
std dev              1.063 ms   (705.3 μs .. 1.658 ms)
variance introduced by outliers: 10% (moderately inflated)

benchmarking 100000/default
time                 33.27 ms   (32.78 ms .. 33.76 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 33.52 ms   (33.28 ms .. 33.81 ms)
std dev              523.8 μs   (360.0 μs .. 840.4 μs)

benchmarking 1000000/vector-builder
time                 302.8 ms   (257.0 ms .. 346.7 ms)
                     0.988 R²   (0.940 R² .. 1.000 R²)
mean                 309.0 ms   (286.7 ms .. 326.0 ms)
std dev              22.40 ms   (10.72 ms .. 30.25 ms)
variance introduced by outliers: 18% (moderately inflated)

benchmarking 1000000/default
time                 336.0 ms   (314.4 ms .. 358.4 ms)
                     0.996 R²   (0.983 R² .. 1.000 R²)
mean                 330.8 ms   (319.9 ms .. 337.3 ms)
std dev              10.77 ms   (3.708 ms .. 14.49 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking 10000000/vector-builder
time                 3.097 s    (2.271 s .. 4.085 s)
                     0.988 R²   (0.961 R² .. 1.000 R²)
mean                 3.343 s    (3.188 s .. 3.647 s)
std dev              263.2 ms   (543.9 as .. 264.7 ms)
variance introduced by outliers: 21% (moderately inflated)

benchmarking 10000000/default
time                 3.784 s    (3.544 s .. 3.952 s)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 3.675 s    (3.586 s .. 3.724 s)
std dev              77.91 ms   (0.0 s .. 83.70 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking 100000000/vector-builder
time                 192.0 s    (144.6 s .. 218.6 s)
                     0.992 R²   (0.984 R² .. NaN R²)
mean                 188.9 s    (182.5 s .. 193.1 s)
std dev              6.319 s    (0.0 s .. 7.296 s)
variance introduced by outliers: 19% (moderately inflated)

benchmarking 100000000/default
time                 163.0 s    (105.0 s .. NaN s)
                     0.976 R²   (0.930 R² .. 1.000 R²)
mean                 160.0 s    (151.1 s .. 174.2 s)
std dev              12.44 s    (0.0 s .. 13.31 s)
variance introduced by outliers: 21% (moderately inflated)
klapaucius commented 6 years ago

is it boxed Vector?

nikita-volkov commented 6 years ago

Yep. Here's the source: https://github.com/nikita-volkov/vector-builder/blob/455ca422c6e97f9268ddfe1234e989b95441c5ac/foldl-benchmark/Main.hs

klapaucius commented 6 years ago

ok, every vector is boxed. Is it vector of boxed values?

klapaucius commented 6 years ago

i'm more interested in the Unboxed/Storable case

Gabriella439 commented 6 years ago

The last set of results are a little surprising because up until then the default implementation's performance scaled linearly with the number of elements, but then a 10x increase in elements ran 40x slower

klapaucius commented 6 years ago

IMO pretty normal for something with 5GB RES.

Gabriella439 commented 6 years ago

What's weird is that the performance difference that @klapaucius is seeing is significantly larger than the one that @nikita-volkov is measuring

@klapaucius: Also, do you get the same results when using Control.Foldl.vector from foldl-1.3.1?

klapaucius commented 6 years ago

i's difference between Data.Vector and Data.Vector.Unboxed , so performance difference is actually understandable.

klapaucius commented 6 years ago

foldl-1.2.5 Data.Vector

10000000
     411,952,312 bytes allocated in the heap
     490,201,792 bytes copied during GC
     169,415,272 bytes maximum residency (9 sample(s))
       2,056,576 bytes maximum slop
             369 MB total memory in use (50 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       298 colls,     0 par    0.578s   0.572s     0.0019s    0.0668s
  Gen  1         9 colls,     0 par    0.219s   0.275s     0.0306s    0.1178s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.094s  (  0.171s elapsed)
  GC      time    0.797s  (  0.847s elapsed)
  EXIT    time    0.000s  (  0.033s elapsed)
  Total   time    0.906s  (  1.051s elapsed)

  %GC     time      87.9%  (80.6% elapsed)

  Alloc rate    4,394,157,994 bytes per MUT second

  Productivity  12.1% of total user, 19.4% of total elapsed

foldl-1.3.1, vector-builder-0.3.3 Data.Vector

10000000
   1,938,887,592 bytes allocated in the heap
   2,076,921,448 bytes copied during GC
     630,191,328 bytes maximum residency (11 sample(s))
       8,271,064 bytes maximum slop
            1088 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3541 colls,     0 par    1.406s   1.482s     0.0004s    0.0468s
  Gen  1        11 colls,     0 par    1.172s   1.565s     0.1423s    0.5815s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.594s  (  0.411s elapsed)
  GC      time    2.578s  (  3.047s elapsed)
  EXIT    time    0.000s  (  0.131s elapsed)
  Total   time    3.172s  (  3.590s elapsed)

  %GC     time      81.3%  (84.9% elapsed)

  Alloc rate    3,265,494,891 bytes per MUT second

  Productivity  18.7% of total user, 15.1% of total elapsed

foldl-1.2.5 Data.Vector.Unboxed

10000000
     971,706,320 bytes allocated in the heap
          13,496 bytes copied during GC
      83,887,216 bytes maximum residency (9 sample(s))
       2,071,424 bytes maximum slop
             168 MB total memory in use (5 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1388 colls,     0 par    0.000s   0.003s     0.0000s    0.0001s
  Gen  1         9 colls,     0 par    0.000s   0.017s     0.0019s    0.0157s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.156s  (  0.198s elapsed)
  GC      time    0.000s  (  0.020s elapsed)
  EXIT    time    0.000s  (  0.016s elapsed)
  Total   time    0.156s  (  0.234s elapsed)

  %GC     time       0.0%  (8.4% elapsed)

  Alloc rate    6,218,920,448 bytes per MUT second

  Productivity 100.0% of total user, 91.5% of total elapsed

foldl-1.3.1, vector-builder-0.3.3 Data.Vector.Unboxed

10000000
   1,938,809,440 bytes allocated in the heap
   2,076,900,352 bytes copied during GC
     630,113,176 bytes maximum residency (11 sample(s))
       8,349,216 bytes maximum slop
            1088 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3541 colls,     0 par    1.172s   1.193s     0.0003s    0.0124s
  Gen  1        11 colls,     0 par    1.172s   1.504s     0.1367s    0.5151s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.438s  (  0.408s elapsed)
  GC      time    2.344s  (  2.697s elapsed)
  EXIT    time    0.016s  (  0.114s elapsed)
  Total   time    2.812s  (  3.220s elapsed)

  %GC     time      83.3%  (83.8% elapsed)

  Alloc rate    4,431,564,434 bytes per MUT second

  Productivity  16.7% of total user, 16.2% of total elapsed
nikita-volkov commented 6 years ago

I've updated my benchmarks to include unboxed vectors. For some reason this time the boxed version starts losing at 100,000. The same applies to the unboxed one as well.

Here are the results:

benchmarking boxed/1000/builder
time                 59.84 μs   (59.29 μs .. 60.50 μs)
                     0.998 R²   (0.997 R² .. 1.000 R²)
mean                 60.16 μs   (59.62 μs .. 60.95 μs)
std dev              2.558 μs   (1.487 μs .. 3.415 μs)
variance introduced by outliers: 46% (moderately inflated)

benchmarking boxed/1000/growing
time                 209.8 μs   (208.9 μs .. 210.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 209.6 μs   (209.1 μs .. 210.1 μs)
std dev              1.944 μs   (1.526 μs .. 2.459 μs)

benchmarking boxed/10000/builder
time                 1.907 ms   (1.895 ms .. 1.921 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.895 ms   (1.884 ms .. 1.907 ms)
std dev              41.68 μs   (33.66 μs .. 49.92 μs)

benchmarking boxed/10000/growing
time                 2.700 ms   (2.597 ms .. 2.792 ms)
                     0.992 R²   (0.989 R² .. 0.996 R²)
mean                 2.606 ms   (2.568 ms .. 2.652 ms)
std dev              138.9 μs   (109.3 μs .. 165.8 μs)
variance introduced by outliers: 35% (moderately inflated)

benchmarking boxed/100000/builder
time                 32.05 ms   (31.25 ms .. 32.85 ms)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 32.78 ms   (32.09 ms .. 33.95 ms)
std dev              2.099 ms   (726.2 μs .. 3.406 ms)
variance introduced by outliers: 22% (moderately inflated)

benchmarking boxed/100000/growing
time                 30.39 ms   (28.95 ms .. 31.87 ms)
                     0.995 R²   (0.991 R² .. 0.998 R²)
mean                 29.82 ms   (29.16 ms .. 30.65 ms)
std dev              1.470 ms   (852.1 μs .. 1.832 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking boxed/1000000/builder
time                 359.6 ms   (261.1 ms .. 426.5 ms)
                     0.992 R²   (0.972 R² .. 1.000 R²)
mean                 356.3 ms   (335.9 ms .. 366.6 ms)
std dev              17.63 ms   (67.99 as .. 17.83 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking boxed/1000000/growing
time                 304.4 ms   (291.2 ms .. 319.0 ms)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 293.8 ms   (288.8 ms .. 297.6 ms)
std dev              5.068 ms   (2.172 ms .. 6.338 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking boxed/10000000/builder
time                 3.693 s    (3.108 s .. 4.281 s)
                     0.997 R²   (0.988 R² .. 1.000 R²)
mean                 3.945 s    (3.791 s .. 4.059 s)
std dev              173.0 ms   (0.0 s .. 197.2 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking boxed/10000000/growing
time                 3.515 s    (3.139 s .. 3.812 s)
                     0.999 R²   (0.995 R² .. 1.000 R²)
mean                 3.437 s    (3.370 s .. 3.476 s)
std dev              60.32 ms   (0.0 s .. 66.73 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking unboxed/1000/builder
time                 59.20 μs   (57.66 μs .. 60.87 μs)
                     0.996 R²   (0.995 R² .. 0.999 R²)
mean                 59.64 μs   (58.84 μs .. 60.88 μs)
std dev              2.944 μs   (2.280 μs .. 3.745 μs)
variance introduced by outliers: 54% (severely inflated)

benchmarking unboxed/1000/growing
time                 197.5 μs   (195.9 μs .. 199.6 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 197.8 μs   (196.2 μs .. 199.8 μs)
std dev              5.576 μs   (4.635 μs .. 6.943 μs)
variance introduced by outliers: 23% (moderately inflated)

benchmarking unboxed/10000/builder
time                 1.697 ms   (1.672 ms .. 1.722 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 1.709 ms   (1.694 ms .. 1.720 ms)
std dev              42.34 μs   (33.00 μs .. 53.54 μs)
variance introduced by outliers: 13% (moderately inflated)

benchmarking unboxed/10000/growing
time                 2.244 ms   (2.200 ms .. 2.297 ms)
                     0.998 R²   (0.997 R² .. 1.000 R²)
mean                 2.217 ms   (2.207 ms .. 2.231 ms)
std dev              43.72 μs   (33.02 μs .. 64.60 μs)

benchmarking unboxed/100000/builder
time                 30.00 ms   (29.28 ms .. 30.65 ms)
                     0.998 R²   (0.995 R² .. 0.999 R²)
mean                 30.34 ms   (29.81 ms .. 31.07 ms)
std dev              1.568 ms   (787.1 μs .. 2.360 ms)
variance introduced by outliers: 17% (moderately inflated)

benchmarking unboxed/100000/growing
time                 23.12 ms   (22.74 ms .. 23.88 ms)
                     0.996 R²   (0.990 R² .. 1.000 R²)
mean                 23.44 ms   (23.07 ms .. 23.86 ms)
std dev              859.0 μs   (597.7 μs .. 1.101 ms)

benchmarking unboxed/1000000/builder
time                 280.6 ms   (271.3 ms .. 287.9 ms)
                     1.000 R²   (0.998 R² .. 1.000 R²)
mean                 316.7 ms   (306.6 ms .. 334.9 ms)
std dev              17.44 ms   (2.560 ms .. 22.38 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking unboxed/1000000/growing
time                 222.9 ms   (215.0 ms .. 228.4 ms)
                     0.999 R²   (0.995 R² .. 1.000 R²)
mean                 223.1 ms   (219.5 ms .. 225.3 ms)
std dev              3.460 ms   (1.354 ms .. 4.725 ms)
variance introduced by outliers: 14% (moderately inflated)

benchmarking unboxed/10000000/builder
time                 2.975 s    (2.485 s .. 3.566 s)
                     0.995 R²   (0.984 R² .. 1.000 R²)
mean                 3.251 s    (3.109 s .. 3.355 s)
std dev              157.7 ms   (0.0 s .. 180.0 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking unboxed/10000000/growing
time                 2.280 s    (2.047 s .. 2.483 s)
                     0.999 R²   (0.995 R² .. 1.000 R²)
mean                 2.210 s    (2.164 s .. 2.237 s)
std dev              41.37 ms   (0.0 s .. 46.09 ms)
variance introduced by outliers: 19% (moderately inflated)

(Source)

klapaucius commented 6 years ago

More importantly, criterion tries very hard to escape any GC influence on results: performs major collections in between measurements, writes off pauses as outliers, etc. So it tells very little about real performance on large object trees with zillions of refs. After 100K it's literally GC benchmark:

1M
MUT     time    0.031s  (  0.021s elapsed)
GC      time    0.078s  (  0.100s elapsed)

10M
MUT     time    0.125s  (  0.157s elapsed)
GC      time    0.766s  (  0.836s elapsed)

100M
MUT     time    2.953s  (  2.521s elapsed)
GC      time   19.266s  ( 21.591s elapsed)

So it's critical to have vectorM or something with less memory pressure because, if you can afford many gigs of wasted RAM, 0.5 sec pauses and abysmal 15% MUT productivity why bother to use foldl at all? Isn't it library for efficient, low memory footprint consumption of large datasets?

Gabriella439 commented 6 years ago

Either way I'll add vectorM. I was just wanted to make sure that I understood the difference in results

Gabriella439 commented 6 years ago

This is now up on Hackage as foldl-1.3.3