Time is already recorded on the client side and stored in the currently unused recordReq.t
field. Avoiding these repeated calls to time.Now while the worker is blocked can significantly
reduce worker contention.
I observed the following results on the existing benchmarks with this change:
name old time/op new time/op delta
RecordReqCommand-16 2.91µs ± 4% 2.47µs ± 4% -15.08% (p=0.000 n=20+20)
RecordViaStats-16 3.52µs ± 5% 3.06µs ± 3% -13.00% (p=0.000 n=20+20)
name old alloc/op new alloc/op delta
RecordReqCommand-16 426B ± 0% 426B ± 0% ~ (all equal)
RecordViaStats-16 634B ± 0% 634B ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
RecordReqCommand-16 25.0 ± 0% 25.0 ± 0% ~ (all equal)
RecordViaStats-16 29.0 ± 0% 29.0 ± 0% ~ (all equal)
In addition when profiling other benchmarks where worker contention proved to be a bottleneck I've observed cases where around half of the worker's time was spent calling time.Now(), e.g.:
(pprof) top view.*start -cum
Active filters:
focus=view.*start
Showing nodes accounting for 3.75s, 10.45% of 35.90s total
Dropped 42 nodes (cum <= 0.18s)
Showing top 10 nodes out of 58
flat flat% sum% cum cum%
0 0% 0% 5.72s 15.93% go.opencensus.io/stats/view.(*worker).start
0.27s 0.75% 0.75% 5.03s 14.01% go.opencensus.io/stats/view.(*recordReq).handleCommand
0.18s 0.5% 1.25% 2.88s 8.02% time.Now
0.08s 0.22% 1.48% 2.70s 7.52% time.now
1.34s 3.73% 5.21% 1.41s 3.93% runtime.walltime (inline)
1.15s 3.20% 8.41% 1.21s 3.37% runtime.nanotime (inline)
0.03s 0.084% 8.50% 0.96s 2.67% go.opencensus.io/stats/view.(*viewInternal).addSample
0.58s 1.62% 10.11% 0.81s 2.26% runtime.mapiternext
0.03s 0.084% 10.19% 0.73s 2.03% go.opencensus.io/stats/view.encodeWithKeys
0.09s 0.25% 10.45% 0.69s 1.92% runtime.selectgo
Time is already recorded on the client side and stored in the currently unused recordReq.t field. Avoiding these repeated calls to time.Now while the worker is blocked can significantly reduce worker contention.
I observed the following results on the existing benchmarks with this change:
In addition when profiling other benchmarks where worker contention proved to be a bottleneck I've observed cases where around half of the worker's time was spent calling time.Now(), e.g.: