erikdubbelboer / fasthttp

This fork has been deprecated!
MIT License
50 stars 12 forks source link

fix RawHeaders for POST #29

Closed illotum closed 6 years ago

illotum commented 6 years ago

I'm sorry for the hassle, but apparently fhttp has fast path header parser for HEAD and GET and rawHeaders isn't properly set for other methods. This PR is a fix to #27.

Admittedly this affects global hot path, but any difference is lost in noise:

# 10 runs of each
name                       old time/op  new time/op  delta
ServerGet10KReqPerConn-8    627ns ± 1%   635ns ± 2%  +1.33%  (p=0.026 n=9+10)
ServerPost10KReqPerConn-8   702ns ± 9%   699ns ± 6%    ~     (p=0.825 n=10+9)

Intuitively, one int counter doesn't change much.

illotum commented 6 years ago

I'm observing odd behaviour with non-trivial binary bodies in my fork, where rawHeaders is taken right from the body.

Please consider this PR work in progress until I'm able to identify and fix the root cause of this issue.

illotum commented 6 years ago

While this fixed the issue, the performance implications are unfortunately not trivial:

name                       old time/op  new time/op  delta
ServerGet10KReqPerConn-8    286ns ± 7%   282ns ±13%     ~     (p=0.424 n=10+10)
ServerPost10KReqPerConn-8   331ns ±14%   390ns ±10%  +17.94%  (p=0.000 n=10+10)

@erikdubbelboer For my particular use case this loss is well worth it, but I understand if you would like to postpone the merge.

Ideas how we can work around Peek's sliding window for this particular code path are definitely welcome.

erikdubbelboer commented 6 years ago

How are you benchmarking this? I wouldn't expect a big difference seeing as rawHeaders is reused so it only needs to copy the raw header bytes.

illotum commented 6 years ago

Fair point! I'm comparing

go test -bench='BenchmarkServer(Post|Get)10KReq' -count=10

between my branch and your master. Laptop unfortunately.


Here's a rerun, this time with lower difference.

name                       old time/op  new time/op  delta
ServerPost10KReqPerConn-8   319ns ± 3%   349ns ±13%  +9.60%  (p=0.000 n=10+10)
erikdubbelboer commented 6 years ago

Could it be that you're mixing up old an new? On my benchmarks the new code always seems to be faster:

benchmark                              old ns/op     new ns/op     delta
BenchmarkServerGet10KReqPerConn-8      278           263           -5.40%
BenchmarkServerPost10KReqPerConn-8     350           329           -6.00%

benchmark                              old allocs     new allocs     delta
BenchmarkServerGet10KReqPerConn-8      0              0              +0.00%
BenchmarkServerPost10KReqPerConn-8     0              0              +0.00%

benchmark                              old bytes     new bytes     delta
BenchmarkServerGet10KReqPerConn-8      0             0             +0.00%
BenchmarkServerPost10KReqPerConn-8     0             0             +0.00%

I'm using Go version go1.10 darwin/amd64.

illotum commented 6 years ago

Funny. No, I'm not mixing it, but this is also not the first time laptop benchmarks fail me. It's Go v1.10 on darwin as well.

I'll post stats from linux/amd64 @ ec2 later today.

illotum commented 6 years ago
~> benchstat --geomean fhttp postfix  # 10 runs, linux/amd4 @ c4.8xlarge
name                        old time/op  new time/op  delta
ServerGet10KReqPerConn-36    610ns ± 8%   624ns ± 9%    ~     (p=0.289 n=10+10)
ServerPost10KReqPerConn-36   654ns ±12%   637ns ± 9%    ~     (p=0.403 n=10+10)
[Geo mean]                   632ns        630ns       -0.29%

I'm getting convinced any real difference will be noticeable only on the most stable setups. I feel comfortable merging it at this point.

erikdubbelboer commented 6 years ago

On my Ubuntu server with 1.10 the new code is still faster.

I also think it’s ready but just to be sure I just put it live on one of our servers to test. If nothing strange happens I’ll merge it tomorrow.