List traversal fix, append to list then reverse at end to preserve order...

christian-marie commented 10 years ago

I've made this same mistake plenty of times, prepending to a long linked list is expensive as it requires you to walk the list.

This explains our non-linear performance.

The common idiom for solving this (in C at least) is to append to the list, then reverse at the end to preserve ordering. That's what I've done here, feel free to chose a different solution.

I think that you will see significant performance increases across the board with this patch.

Before:

~/tmp/bench$ time ./burst +RTS -s
100000
  17,564,736,936 bytes allocated in the heap
  13,633,135,600 bytes copied during GC
       2,898,608 bytes maximum residency (2932 sample(s))
         423,584 bytes maximum slop
               9 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     18590 colls,     0 par    7.85s    7.84s     0.0004s    0.0015s
  Gen  1      2932 colls,     0 par    4.21s    4.21s     0.0014s    0.0029s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    3.72s  (  3.72s elapsed)
  GC      time   12.06s  ( 12.05s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   15.78s  ( 15.77s elapsed)

  %GC     time      76.4%  (76.4% elapsed)

  Alloc rate    4,719,501,087 bytes per MUT second

  Productivity  23.6% of total user, 23.6% of total elapsed

real    0m15.770s
user    0m15.713s
sys 0m0.067s

After:

~/tmp/bench$ time ./burst +RTS -s
100000
      62,300,968 bytes allocated in the heap
      14,002,224 bytes copied during GC
       2,275,680 bytes maximum residency (6 sample(s))
         731,568 bytes maximum slop
               6 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       115 colls,     0 par    0.01s    0.01s     0.0001s    0.0005s
  Gen  1         6 colls,     0 par    0.00s    0.00s     0.0007s    0.0019s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.02s  (  0.02s elapsed)
  GC      time    0.01s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.03s  (  0.03s elapsed)

  %GC     time      38.9%  (39.9% elapsed)

  Alloc rate    3,804,130,501 bytes per MUT second

  Productivity  60.8% of total user, 62.5% of total elapsed

real    0m0.030s
user    0m0.020s
sys 0m0.007s

burst.hs being:

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.ProtocolBuffers
import GHC.Generics(Generic)
import Data.ByteString(ByteString)
import qualified Data.ByteString as B
import Data.TypeLevel(D1)
import Data.Serialize

data Burst = Burst {
    frames :: Repeated D1 (Value ByteString)
} deriving (Generic, Eq, Show)

instance Encode Burst
instance Decode Burst

encodeBurst :: [ByteString] -> ByteString
encodeBurst = runPut . encodeMessage . Burst . putField

decodeBurst :: ByteString -> Either String Burst
decodeBurst = runGet decodeMessage

main :: IO ()
main = do
    let encoded = encodeBurst $ replicate 20000 "hai"
    print $ B.length encoded

    let decoded = decodeBurst encoded
    either error (const $ return ()) decoded

christian-marie commented 10 years ago

This fixes issue #4.

NathanHowell commented 10 years ago

Thanks for catching this.

christian-marie commented 10 years ago

No problem. Might be more idiomatic to use cons (:) somehow and avoid the reverse. Seems fast enough though.

alphaHeavy / protobuf

List traversal fix, append to list then reverse at end to preserve order... #5