acowley / Frames

Data frames for tabular data.
Other
298 stars 41 forks source link

Compile times/memory #127

Open adamConnerSax opened 5 years ago

adamConnerSax commented 5 years ago

I’m using frames/vinyl with ghc8.6.3 and seeing very large compile times (minutes) and memory use (> 20GB). This is on a relatively powerful Mac Laptop. The memory is the issue, the laptop has only 16GB of RAM so it’s swapping.

I think the issue is a large row type. I’m loading from a csv file with 30+ columns.

As long as I rcast to a smaller row (<8 columns) before doing anything else, I get long but not insane memory use/compile time.

But even trying to count rows, using Control.Foldl.length, with the full row, leads to huge memory use and thus compile times.

Runtime performance is good in all cases.

Remembering to rcast as early as possible is helpful.

Are there any other ways to address this? Or just a more precise set of coding practices to improve this?

It’s helpful that whatever is going wrong happens after typechecking so it’s not as much of a productivity issue as it might be. But it’s still an issue for iterating on analysis ideas, etc.

Any help/pointers would be appreciated!

acowley commented 5 years ago

Thank you for the report, those are interesting observations! So something like loading a file with 30 columns, then counting the rows of a Frame can trigger the heavy memory use? If so, that means you're not even touching the vinyl part yourself, so the slow compile would be due to de-serialization. Can you try removing all the INLINE pragmas from the InCore module to see what effect that has? It might be possible to not use vinyl for carrying a record of dictionaries in that module, but something less type safe.

We should add a many-column test file to the benchmark suite so we can track this.

adamConnerSax commented 5 years ago

Okay. I tried that and, TL;DR, it's not any better. More details: I've been using "-O0" (not a great solution all the time...) and it helps quite a bit. But on a long enough row, I can still get the behavior, though not as bad (climbs to 14GB instead of 20GB). This is (approximately) unchanged by commenting out the INLINE pragmas in InCore.hs. FWIW, in this case, though not every case where this happened before, the frame is built via "boxedFrame" since it's a (Maybe :. ElField) record in each row. So maybe it's not InCore?

ArturGajowy commented 4 years ago

I've had very similar results, and some new insights:

I'm testing on 40 columns, and inCoreAoS (readTable "in.csv") results in 2m compile time, failing with a OOM at 9 GB heap.

On a brighter note, runSafeT $ toFrame <$> (P.toListM . readTable) "in.csv" compiles quikcly, altough yields longer run times.

I think the reason for slow compile times is specialisation of InCore.RecVec instances, suggested by the following:

import Frames (Record)
import Schema (S40)
import Frames.InCore (RecVec)

main :: IO ()
main = do
  rs :: Record S40 <- spawnRecVec
  putStrLn "BOOM"    

spawnRecVec :: RecVec rs => m (Record rs)
spawnRecVec = undefined

The type S40 is just the same column repeated 40 times.

Are there any other compiler flags one could try to have even less of a tradeoff between compilation feasibility and runtime spped?

Maybe one could provide a non-inductive instance for RecVec rs somehow?

ArturGajowy commented 4 years ago

Also worth noting is the -v2 output when RecVec is involved - notice the terms explosion after Specialize, and then another one after second iteration of the following Simplifier phase:

*** Simplifier [Main]:
Result size of Simplifier iteration=1
  = {terms: 239, types: 5,058, coercions: 780, joins: 0/7}
Result size of Simplifier iteration=2
  = {terms: 223, types: 5,046, coercions: 773, joins: 0/0}
Result size of Simplifier
  = {terms: 223, types: 5,046, coercions: 773, joins: 0/0}
!!! Simplifier [Main]: finished in 25.36 milliseconds, allocated 16.744 megabytes
*** Specialise [Main]:
Result size of Specialise
  = {terms: 28,605,
     types: 598,431,
     coercions: 143,424,
     joins: 0/2,521}
!!! Specialise [Main]: finished in 246.24 milliseconds, allocated 374.966 megabytes
*** Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Main]:
Result size of Float out(FOS {Lam = Just 0,
                              Consts = True,
                              OverSatApps = False})
  = {terms: 37,631, types: 825,796, coercions: 143,424, joins: 0/600}
!!! Float out(FOS {Lam = Just 0,
                   Consts = True,
                   OverSatApps = False}) [Main]: finished in 292.68 milliseconds, allocated 183.974 megabytes
*** Simplifier [Main]:
Result size of Simplifier iteration=1
  = {terms: 49,355,
     types: 756,986,
     coercions: 141,125,
     joins: 0/1,244}
Result size of Simplifier iteration=2
  = {terms: 701,907,
     types: 4,908,974,
     coercions: 2,060,360,
     joins: 0/23,067}
adamConnerSax commented 4 years ago

Interesting! Though this wouldn't address some cases, this does make me want a version of the TH that can specify a subset of columns to load. Often my data has a lot of columns but I'm only using a few of them. Also, I wonder if the TH could generate the RecVec instance non-inductively? But that's beyond my TH skills, so I don't know.

acowley commented 4 years ago

Thank you for boiling this down so well, @ArturGajowy!

I still haven't figured out why @adamConnerSax's earlier experiment of taking out all the INLINE pragmas from InCore.hs didn't help, as things like this example sure seem to point at it. I suppose the entire recursive formulation of RecVec instances could be the culprit, in which case we could try generating flat, non-recursive instances for types like, '[s1 :-> a1, s2 :-> a2, s3 :-> a3, ...]. I'm not sure if it's worth doing that generation in TH, or if some external script would be less brittle.

I do also like @adamConnerSax's idea of keeping the full row type as a stream of unparsed tokens, and only paying a compile-time price for a subset of the columns, but it would be nice to have a solid plan for what to do in case one does want all the columns.

Just so we're for sure on the exact same page, @ArturGajowy can you please paste your Schema module somewhere?

ArturGajowy commented 4 years ago

Thank you :)

Schema.hs (clikck to expand) ```haskell {-# LANGUAGE DataKinds #-} {-# LANGUAGE TypeOperators #-} module Schema where import Data.Vinyl type S10 = '[ CI, CI, CI, CI, CI, CI, CI, CI, CI, CI ] type S20 = '[ CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI ] type S40 = '[ CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI ] type S80 = '[ CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI, CI ] type CI = "CI" ::: Int ```

EDIT: just noticed the S80 type is just 70 columns 😅

acowley commented 4 years ago

I'm having trouble reproducing the slow down, so I must be doing something different. I ended up adding a new test target to Frames.cabal,

test-suite longrow
  type:                exitcode-stdio-1.0
  hs-source-dirs:      test
  main-is:             CompileTime.hs
  other-modules:       Schema
  build-depends:       base, Frames, vinyl
  default-language:    Haskell2010
  ghc-options:         -O2 -Wall

My source file looks like this,

{-# LANGUAGE DataKinds, FlexibleInstances, ScopedTypeVariables,
             TypeOperators #-}
import qualified Data.Vinyl as V
import Frames (Record)
import Schema (CI, S80)
import Frames.InCore (RecVec)
import System.Environment (getArgs)

class Def rs where
  def :: Record rs

instance Def '[] where
  def = V.RNil

instance Def rs => Def (CI ': rs) where
  def = 0 V.:& def

main :: IO ()
main = do
  xs <- getArgs
  rs :: Record S80 <- case xs of
    ["self-destruct"] -> spawnRecVec
    _ -> pure def
  putStrLn (show (length (show rs)))

spawnRecVec :: RecVec rs => m (Record rs)
spawnRecVec = undefined

I added the conditional on command line arguments because my builds were finishing very quickly and I was worried it wasn't really compiling, so I wanted something I could run. Can either of you help me figure out what I'm doing wrong in order to reproduce the issue? This is with GHC-8.8.4 and cabal-install-3.2.0.0.

adamConnerSax commented 4 years ago

I also see fast compiles for the source above on ghc-8.8.4 and ghc-8.10.2 (both with cabal-install 3.2.0.0).

ArturGajowy commented 4 years ago

@acowley Fun stuff, I only observe a slowdown after adding a module Main where line to your source. I'm running under stack, in a VM - those should not be factors though. We can compare number of terms just to be sure. After adding the module name, I end up at:

Result size of Simplifier iteration=1
  = {terms: 1,974,301,
     types: 21,067,135,
     coercions: 9,605,750,
     joins: 2,485/54,672}

--  While building package records-poc-0.1.0.0 using:
      ~/.stack/setup-exe-cache/x86_64-linux/Cabal-simple_mPHDZzAJ_3.0.1.0_ghc-8.8.4 --builddir=.stack-work/dist/x86_64-linux/Cabal-3.0.1.0 build lib:records-poc exe:records-poc-exe --ghc-options " -fdiagnostics-color=always"
    Process exited with code: ExitFailure (-9) (THIS MAY INDICATE OUT OF MEMORY)

Without, the compile is pretty snappy.

ArturGajowy commented 4 years ago

Just in case: stack ghc -- -fforce-recomp --make -O2 app/Main.hs -v2 yields same results (the module name being present makes a difference)

acowley commented 4 years ago

Adding a module Main might be increasing inlining of unexported identifiers.

adamConnerSax commented 4 years ago

"module Main" does it for me too... As does "module Test"