agrafix / superrecord

Haskell: Supercharged anonymous records
BSD 3-Clause "New" or "Revised" License
83 stars 16 forks source link

Compilation performance degrades rapidly while increasing record length #12

Open dredozubov opened 6 years ago

dredozubov commented 6 years ago

Compilation performance is evident on records of bigger size, just one record of 30-50 fields is able to make GHC swap. There's a branch and a script I've used to dig more info on this: https://github.com/dredozubov/superrecord/tree/ghc-test-case https://github.com/dredozubov/superrecord/blob/ghc-test-case/build-all.sh

I did a small investigation on this. Tried it with GHC 8.0.2 and 8.2.1(it can be done with allow-newer). Building it with GHC HEAD is not currently possible due to the broken dependencies. Here we go: https://github.com/dredozubov/superrecord/blob/ghc-test-case/test/Spec.hs.in#L451 construction of a record with 35 fields take up to 2 minutes

!!! Chasing dependencies: finished in 11.20 milliseconds, allocated 15.786 megabytes
!!! Parser [Spec]: finished in 1.62 milliseconds, allocated 2.921 megabytes
!!! Renamer/typechecker [Spec]: finished in 5053.19 milliseconds, allocated 4453.209 megabytes
!!! Desugar [Spec]: finished in 8061.64 milliseconds, allocated 12993.168 megabytes
!!! Simplifier [Spec]: finished in 9113.55 milliseconds, allocated 10165.327 megabytes
!!! Specialise [Spec]: finished in 12565.69 milliseconds, allocated 8482.198 megabytes
                   OverSatApps = False}) [Spec]: finished in 12262.97 milliseconds, allocated 13919.365 megabytes
!!! Simplifier [Spec]: finished in 42090.28 milliseconds, allocated 42521.308 megabytes
!!! Simplifier [Spec]: finished in 17468.49 milliseconds, allocated 16998.313 megabytes
!!! Simplifier [Spec]: finished in 21349.30 milliseconds, allocated 32972.807 megabytes
!!! Float inwards [Spec]: finished in 804.30 milliseconds, allocated 1908.825 megabytes
!!! Called arity analysis [Spec]: finished in 1986.35 milliseconds, allocated 1895.883 megabytes
!!! Simplifier [Spec]: finished in 1644.02 milliseconds, allocated 2605.471 megabytes
!!! Demand analysis [Spec]: finished in 870.99 milliseconds, allocated 1954.719 megabytes
!!! Worker Wrapper binds [Spec]: finished in 1924.12 milliseconds, allocated 2049.162 megabytes
!!! Simplifier [Spec]: finished in 3616.14 milliseconds, allocated 3650.656 megabytes
                   OverSatApps = True}) [Spec]: finished in 1670.04 milliseconds, allocated 3925.652 megabytes
!!! Common sub-expression [Spec]: finished in 857.66 milliseconds, allocated 1020.030 megabytes
!!! Float inwards [Spec]: finished in 634.74 milliseconds, allocated 1013.631 megabytes
!!! Simplifier [Spec]: finished in 2484.37 milliseconds, allocated 2723.176 megabytes
!!! CoreTidy [Spec]: finished in 254.93 milliseconds, allocated 405.708 megabytes
!!! CorePrep [Spec]: finished in 163.65 milliseconds, allocated 463.810 megabytes
!!! CodeGen [Spec]: finished in 425.96 milliseconds, allocated 474.554 megabytes

RecCopy produces a huge amount of coercions: https://gist.github.com/dredozubov/6ce629d1ec16ac32e9a987ffefda2a2a There's a lot of rewriting going on(a lot of heavy inlining in the library). Here are some core2core dumps with their respective size:

% ls -ls dump-35/test/Spec.verbose-core2core.split/ | tail -14 | awk '{print $1, $10}'
8 S-00
13264 S.01-simplifier
49272 S.02-levels-added
5976 S.03-float-out
42720 S.04-simplifier
41784 S.05-simplifier
8416 S.06-simplifier
8416 S.07-float-inwards
8416 S.08-simplifier
8472 S.09-simplifier
12072 S.10-levels-added
1936 S.11-float-out
1944 S.12-float-inwards
1832 S.13-simplifier

These dumps are too big to add as gists, so I've uploaded this tar archive instead: https://www.dropbox.com/s/pql4qgm5lplbe38/dump-35.tar.gz?dl=0

It's possible to rebuild all of this with nix-shell -p python3 --run "ghc=/Users/dr/.stack/programs/x86_64-osx/ghc-8.0.2/bin/ghc m=25 n=35 ./build-all.sh -package-db ~/.stack/snapshots/x86_64-osx/lts-8.20/8.0.2/pkgdb/", skipping the nix-shell bit if you have python3 installed on your system and substituting the ghc variable and package-db with correct values for you system. m and n variables mean it'll rebuild the module 10 with records of length 25 to 35.

vagarenko commented 6 years ago

I've encountered this too. Compilation time for large records makes this library unusable :(

Wizek commented 6 years ago

Maybe the slowdown has to do something with SuperRecord.SortInsert?

agrafix commented 6 years ago

No, the slowdown also exists w/o the SortInsert :-(

reactormonk commented 6 years ago

When disabling optimization stack build --fast, a record of 20 elements takes about 8 seconds to compile on my laptop, as opposed to ca. 100s when fully optimizing.

jvanbruegge commented 5 years ago

I am currently writing a similar library, and I benchmarked against superrecord (just the compile time) The file that gets compiled is here (just creates a record with 40 entries): https://gist.github.com/jvanbruegge/e2297f8e57e783f845f56f0627afc7ba Results on my laptop for superrecord:

stack build  1054,20s user 15,07s system 98% cpu 18:06,09 total

and my library:

stack build  37,06s user 2,83s system 88% cpu 45,272 total

I am not sure where this difference comes from, my rows are also sorted and the runtime representation of records is also a SmallArray#

jvanbruegge commented 5 years ago

I figured that the order of labels was the best case for my RowAppend type family as it would be a O(1) cons each time (making it O(n)). But even after reversing order of labels which is the O(n²) worst case it was still way faster:

stack build  123,89s user 9,02s system 95% cpu 2:19,09 total