Closed illotum closed 6 years ago
I'm observing odd behaviour with non-trivial binary bodies in my fork, where rawHeaders is taken right from the body.
Please consider this PR work in progress until I'm able to identify and fix the root cause of this issue.
While this fixed the issue, the performance implications are unfortunately not trivial:
name old time/op new time/op delta
ServerGet10KReqPerConn-8 286ns ± 7% 282ns ±13% ~ (p=0.424 n=10+10)
ServerPost10KReqPerConn-8 331ns ±14% 390ns ±10% +17.94% (p=0.000 n=10+10)
@erikdubbelboer For my particular use case this loss is well worth it, but I understand if you would like to postpone the merge.
Ideas how we can work around Peek's sliding window for this particular code path are definitely welcome.
How are you benchmarking this? I wouldn't expect a big difference seeing as rawHeaders
is reused so it only needs to copy the raw header bytes.
Fair point! I'm comparing
go test -bench='BenchmarkServer(Post|Get)10KReq' -count=10
between my branch and your master. Laptop unfortunately.
Here's a rerun, this time with lower difference.
name old time/op new time/op delta
ServerPost10KReqPerConn-8 319ns ± 3% 349ns ±13% +9.60% (p=0.000 n=10+10)
Could it be that you're mixing up old an new? On my benchmarks the new code always seems to be faster:
benchmark old ns/op new ns/op delta
BenchmarkServerGet10KReqPerConn-8 278 263 -5.40%
BenchmarkServerPost10KReqPerConn-8 350 329 -6.00%
benchmark old allocs new allocs delta
BenchmarkServerGet10KReqPerConn-8 0 0 +0.00%
BenchmarkServerPost10KReqPerConn-8 0 0 +0.00%
benchmark old bytes new bytes delta
BenchmarkServerGet10KReqPerConn-8 0 0 +0.00%
BenchmarkServerPost10KReqPerConn-8 0 0 +0.00%
I'm using Go version go1.10 darwin/amd64.
Funny. No, I'm not mixing it, but this is also not the first time laptop benchmarks fail me. It's Go v1.10 on darwin as well.
I'll post stats from linux/amd64 @ ec2 later today.
~> benchstat --geomean fhttp postfix # 10 runs, linux/amd4 @ c4.8xlarge
name old time/op new time/op delta
ServerGet10KReqPerConn-36 610ns ± 8% 624ns ± 9% ~ (p=0.289 n=10+10)
ServerPost10KReqPerConn-36 654ns ±12% 637ns ± 9% ~ (p=0.403 n=10+10)
[Geo mean] 632ns 630ns -0.29%
I'm getting convinced any real difference will be noticeable only on the most stable setups. I feel comfortable merging it at this point.
On my Ubuntu server with 1.10 the new code is still faster.
I also think it’s ready but just to be sure I just put it live on one of our servers to test. If nothing strange happens I’ll merge it tomorrow.
I'm sorry for the hassle, but apparently fhttp has fast path header parser for HEAD and GET and
rawHeaders
isn't properly set for other methods. This PR is a fix to #27.rawHeaders
to a proper sub-slice for methods with possible body.headerScanner
to properly return parsed length.Admittedly this affects global hot path, but any difference is lost in noise:
Intuitively, one int counter doesn't change much.