Closed bohde closed 9 years ago
I'm just about to head out to Bayhac and I'll review the patch sometime today. I'd also like to see some heap profiles.
Back when we first started work on protobuf, binary was missing necessary functionality that cereal had, I think it was lacking an incremental parser. The last attempt (a year or two ago) at switching didn't show that much of performance difference, but they've been doing a lot more work on binary so I'm not surprised the decodes are magically faster.
Using the following code, modified for respective branch:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data
import Nested
import qualified Data.ByteString as BS
writeBurst :: IO ()
writeBurst = do
BS.writeFile "burst.pb" $ enc $ burst 1000000
readBurst :: IO ()
readBurst = do
a <- BS.readFile "burst.pb"
print $ decBurst a
writeNested :: IO ()
writeNested = do
BS.writeFile "nested.pb" $ enc $ nested 100 1000
readNested :: IO ()
readNested = do
a <- BS.readFile "nested.pb"
print $ decNested a
main = readNested
Looks like an easy decision then eh? :+1: This is awesome.
This pull request is backwards incompatible, but by using
binary
'sGet
and a customBuilder
, we can get an average 3-5x better performance, with some cases being 10x+. I don't know if this is a path you'd consider, but wanted to let you know about it.We were experimenting with using protobuf instead of aeson for encoding one of our data types, and found that it was taking longer for both encoding and decoding. A simplified version of this data type is in
bench/Nested.hs
.A quick change to use
binary
instead ofcereal
improved the decode speed, but neitherData.Binary.Put
norData.Binary.Builder
was that much faster.Profiling revealed that encoding Messages was expensive, because we'd need to encode the message fully, calculate the length, and then write both. If we're using the
Builder
interface, with it'sMonoid
instance to build results, then we can memoize the length of the resultingByteString
during construction by using a type(Sum, Builder)
. Because every type we serialize in a protobuf either has a statically known length, or is length prefixed, this adds a minor amount of calculation in the worst case, but a nice improvement in the best case. A version of this builder is included undersrc/Data/Binary/Builder/Sized.hs
I've added benchmarks for this branch, with corresponding benchmarks for master in https://github.com/joshbohde/protobuf/tree/benchmark . They include the
Burst
data type from #4, as well. Results run on my laptop are here: https://gist.github.com/joshbohde/2b7b763805ddb2c6bfd1. Here they are in a table:I'm suspicious of some of these numbers, but can't find anything obviously wrong with the benchmarks.
Internally, we're seeing about a 100x speedup in decoding, and a 10x speedup in encoding of real data using this branch.