I've made this same mistake plenty of times, prepending to a long linked list
is expensive as it requires you to walk the list.
This explains our non-linear performance.
The common idiom for solving this (in C at least) is to append to the list,
then reverse at the end to preserve ordering. That's what I've done here, feel
free to chose a different solution.
I think that you will see significant performance increases across the board
with this patch.
Before:
~/tmp/bench$ time ./burst +RTS -s
100000
17,564,736,936 bytes allocated in the heap
13,633,135,600 bytes copied during GC
2,898,608 bytes maximum residency (2932 sample(s))
423,584 bytes maximum slop
9 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 18590 colls, 0 par 7.85s 7.84s 0.0004s 0.0015s
Gen 1 2932 colls, 0 par 4.21s 4.21s 0.0014s 0.0029s
INIT time 0.00s ( 0.00s elapsed)
MUT time 3.72s ( 3.72s elapsed)
GC time 12.06s ( 12.05s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 15.78s ( 15.77s elapsed)
%GC time 76.4% (76.4% elapsed)
Alloc rate 4,719,501,087 bytes per MUT second
Productivity 23.6% of total user, 23.6% of total elapsed
real 0m15.770s
user 0m15.713s
sys 0m0.067s
After:
~/tmp/bench$ time ./burst +RTS -s
100000
62,300,968 bytes allocated in the heap
14,002,224 bytes copied during GC
2,275,680 bytes maximum residency (6 sample(s))
731,568 bytes maximum slop
6 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 115 colls, 0 par 0.01s 0.01s 0.0001s 0.0005s
Gen 1 6 colls, 0 par 0.00s 0.00s 0.0007s 0.0019s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.02s ( 0.02s elapsed)
GC time 0.01s ( 0.01s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.03s ( 0.03s elapsed)
%GC time 38.9% (39.9% elapsed)
Alloc rate 3,804,130,501 bytes per MUT second
Productivity 60.8% of total user, 62.5% of total elapsed
real 0m0.030s
user 0m0.020s
sys 0m0.007s
burst.hs being:
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.ProtocolBuffers
import GHC.Generics(Generic)
import Data.ByteString(ByteString)
import qualified Data.ByteString as B
import Data.TypeLevel(D1)
import Data.Serialize
data Burst = Burst {
frames :: Repeated D1 (Value ByteString)
} deriving (Generic, Eq, Show)
instance Encode Burst
instance Decode Burst
encodeBurst :: [ByteString] -> ByteString
encodeBurst = runPut . encodeMessage . Burst . putField
decodeBurst :: ByteString -> Either String Burst
decodeBurst = runGet decodeMessage
main :: IO ()
main = do
let encoded = encodeBurst $ replicate 20000 "hai"
print $ B.length encoded
let decoded = decodeBurst encoded
either error (const $ return ()) decoded
I've made this same mistake plenty of times, prepending to a long linked list is expensive as it requires you to walk the list.
This explains our non-linear performance.
The common idiom for solving this (in C at least) is to append to the list, then reverse at the end to preserve ordering. That's what I've done here, feel free to chose a different solution.
I think that you will see significant performance increases across the board with this patch.
Before:
After:
burst.hs being: