def- / ghc-datasize

ghc-datasize is a tool to determine the size of Haskell data structures in GHC's memory
Other
17 stars 9 forks source link

recursiveSize is too slow to measure big data structures #11

Open sgraf812 opened 3 years ago

sgraf812 commented 3 years ago

recursiveSize uses a set represented as a list to check for boxes it already encountered. That is very inefficient; I'm measuring a tree-like data structure of about 40MB in size and I didn't bother letting the call finish. I fixed it by removing thelist, but obviously that doesn't work for cyclic data structures.

Not sure what can be done about this; ideally we'd simply put the pointer values into an IntMap, but that doesn't work with a moving GC. Very sad.

aryairani commented 2 years ago

Same here. My application footprint is 400MB and I've spent 5 hours of CPU time trying to measure some part of it that's by definition smaller than that. @sgraf812 Can you share your fix?

sgraf812 commented 2 years ago

Here's my version of DataSize.hs:

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}

{- |
   Module      : GHC.DataSize
   Copyright   : (c) Dennis Felsing
   License     : 3-Clause BSD-style
   Maintainer  : dennis@felsin9.de
 -}
module DataSize (
  closureSize,
  recursiveSize,
  recursiveSizeNF
  )
  where

import Control.DeepSeq (NFData, force)

import Data.Word (Word)

import GHC.Exts
import GHC.Exts.Heap hiding (size)
import GHC.Exts.Heap.Constants (wORD_SIZE)

import Control.Monad

import System.Mem

-- Inspired by Simon Marlow:
-- https://ghcmutterings.wordpress.com/2009/02/12/53/

-- | Calculate size of GHC objects in Bytes. Note that an object may not be
--   evaluated yet and only the size of the initial closure is returned.
closureSize :: a -> IO Word
closureSize x = do
  rawWds <- getClosureRawWords x
  return . fromIntegral $ length rawWds * wORD_SIZE

-- | Calculate the recursive size of GHC objects in Bytes. Note that the actual
--   size in memory is calculated, so shared values are only counted once.
--
--   Call with
--   @
--    recursiveSize $! 2
--   @
--   to force evaluation to WHNF before calculating the size.
--
--   Call with
--   @
--    recursiveSize $!! \"foobar\"
--   @
--   ($!! from Control.DeepSeq) to force full evaluation before calculating the
--   size.
--
--   A garbage collection is performed before the size is calculated, because
--   the garbage collector would make heap walks difficult.
--
--   This function works very quickly on small data structures, but can be slow
--   on large and complex ones. If speed is an issue it's probably possible to
--   get the exact size of a small portion of the data structure and then
--   estimate the total size from that.

recursiveSize :: a -> IO Word
recursiveSize x = do
  performGC
  go 0 $ asBox x
  where go !acc b@(Box y) = do
             size    <- closureSize y
             closure <- getClosureData y
             foldM go (acc + size) $ allClosures closure

-- | Calculate the recursive size of GHC objects in Bytes after calling
-- Control.DeepSeq.force on the data structure to force it into Normal Form.
-- Using this function requires that the data structure has an `NFData`
-- typeclass instance.

recursiveSizeNF :: NFData a => a -> IO Word
recursiveSizeNF = recursiveSize . force

-- | Adapted from 'GHC.Exts.Heap.getClosureRaw' which isn't exported.
--
-- This returns the raw words of the closure on the heap. Once back in the
-- Haskell world, the raw words that hold pointers may be outdated after a
-- garbage collector run.
getClosureRawWords :: a -> IO [Word]
getClosureRawWords x = do
    case unpackClosure# x of
        (# _iptr, dat, _pointers #) -> do
            let nelems = (I# (sizeofByteArray# dat)) `div` wORD_SIZE
                end = fromIntegral nelems - 1
            pure [W# (indexWordArray# dat i) | I# i <- [0.. end] ]

Keep in mind that it most likely will loop endlessly for you if you use it to measure cyclic data structures.