GaloisInc / mime

A Haskell MIME library
Other
12 stars 11 forks source link

Simpler, faster normalizeCRLF. #10

Closed dato closed 3 years ago

dato commented 7 years ago

Context: I was seeing very bad performance for mails with a couple attachments (not too big, mind you; 250 KiB in total):

           Slow +RTS -p -RTS

        total time  =       13.45 secs   (13452 ticks @ 1000 us, 1 processor)
        total alloc = 26,975,952,656 bytes  (excludes profiling overheads)

COST CENTRE   MODULE            %time %alloc

normalizeCRLF Codec.MIME.Parse   99.2   94.4
run           Data.Text.Array     0.8    5.6

By leveraging Data.Text’s own functions, the number of allocations drops dramatically:

           Fast +RTS -p -RTS

        total time  =        0.02 secs   (18 ticks @ 1000 us, 1 processor)
        total alloc =   8,073,744 bytes  (excludes profiling overheads)

COST CENTRE    MODULE                 %time %alloc

MAIN           MAIN                    55.6   33.7
readTextDevice Data.Text.Internal.IO   22.2    0.9
concat         Data.Text               16.7   38.3
readChunk      Data.Text.Internal.IO    5.6    6.6
run            Data.Text.Array          0.0   19.4             

The code I used was:

import Codec.MIME.Type
import Codec.MIME.Parse
import qualified Data.Text.IO as TIO

countAttachments :: MIMEValue -> Int
countAttachments msg =  
  case mime_val_content msg of
    Multi parts -> sum $ map countAttachments parts
    Single _    -> case dispType <$> mime_val_disp msg of
      Just DispAttachment -> 1         
      _ -> 0

main :: IO ()  
main = do      
  msg <- parseMIMEMessage <$> TIO.getContents
  print $ countAttachments msg
dato commented 7 years ago

Ah, no. This is not good, because now the function is not idempotent. And parseMIMEMessage calls itself recursively, which makes it worse.

dato commented 7 years ago

Data.Text used to have a lines' function that parsed CRLF correctly, but it was removed in https://github.com/bos/text/commit/6818295d1a72ae09756fa4b07bc2da289e730e6f.

The following would be idempotent:

normalizeCRLF  = T.intercalate "\r\n" . T.lines'