Preallocate string size for performance

ajusa commented 2 years ago

Inspired by #63. I noticed that a decent chunk of time is spent in string allocation, probably because each time we add onto the string past a certain limit Nim needs to reallocate it and get a bigger string. In this case, since we know the sizes of most of the things we are allocating we can simply preallocate the appoximate size of the buffer we'll need.

I used ~30 to refer to the number of bytes inside of the quotes. There's a bit of headroom there due to the HTTP code size in bytes, but it shouldn't actually affect performance all that much. This probably improved htppbeast performance on my machine by a few percent?

This method for preallocating is about 25% faster than what we have there right now from that last PR though:

import benchy
import strutils

const serverInfo = "HttpBeast"
let 
  code = 200
  body = "hello world"
  serverDate = "02-20-2020"
  otherHeaders = ""

const reps = 100_000

timeIt "% formatting":
    for i in 0..reps:
      let text =  "HTTP/1.1 $#\c\L" & "Content-Length: $#\c\LServer: $#\c\LDate: $#$#\c\L\c\L$#" % [$code, $body.len, serverInfo, serverDate, otherHeaders, body]
      keep text

timeIt "concating":
  for i in 0..reps:
    var text = ""
    text &= "HTTP/1.1 "
    text &= $code
    text &= "\c\LContent-Length: "
    text &= $body.len
    text &= "\c\LServer: " & serverInfo
    text &= "\c\LDate: "
    text &= serverDate
    text &= otherHeaders
    text &= "\c\L\c\L"
    text &= body
    keep text

timeIt "smart concating":
  for i in 0..reps:
    var text = newStringOfCap(40 + body.len + serverInfo.len + serverDate.len + otherHeaders.len)
    text &= "HTTP/1.1 "
    text &= $code
    text &= "\c\LContent-Length: "
    text &= $body.len
    text &= "\c\LServer: " & serverInfo
    text &= "\c\LDate: "
    text &= serverDate
    text &= otherHeaders
    text &= "\c\L\c\L"
    text &= body
    keep text

name ............................... min time      avg time    std dv   runs
% formatting ...................... 44.926 ms     45.232 ms    ±0.264   x110
concating ......................... 19.794 ms     19.846 ms    ±0.029   x252
smart concating ................... 14.677 ms     14.833 ms    ±0.034   x337

dom96 commented 2 years ago

Nice! I've been doing some benchmarking and it looks like the mark and sweep GC performs significantly better. I was able to get current HEAD to 1.5 million QPS. With this PR this goes up to 1.65 million QPS! This is definitely a hot path.

I think there is even more room here for improvements, we can get rid of allocating this string completely. I'm currently playing with this: https://gist.github.com/dom96/a041fabecd346579744c3b78ba599ec9. With the following results:

name ............................... min time      avg time    std dv   runs
% formatting ...................... 39.835 ms     40.321 ms    ±0.164   x124
concating ......................... 15.964 ms     16.314 ms    ±0.139   x307
smart concating ................... 11.044 ms     11.285 ms    ±0.069   x443
pre-alloc concating ................ 5.884 ms      5.983 ms    ±0.043   x835

This will require some special casing in httpbeast for small responses, but I'm excited to see how much faster it will make it.

ajusa commented 2 years ago

Yeah, the tricky bit is finding the things on the hot path - after that, we can just extract out the bit of code that we need, and use benchy to write a faster version. The only exception to this are the OS/system calls. We don't have as much insight into those (such as the performance metrics for them).

If there are other hot paths anyone is able to find, opening an issue would be a good first step to getting others to optimize the code!

dom96 / httpbeast

Preallocate string size for performance #64