giltene / wrk2

A constant throughput, correct latency recording variant of wrk
Apache License 2.0
4.24k stars 382 forks source link

Improve aeTimeEvent resolution from 1 msec to 1 usec #78

Open giltene opened 5 years ago

giltene commented 5 years ago

One of the valid "nits" with wrk2 is that it can "over-report" latencies by up to 1msec because the rate-limiting model uses the call: aeCreateTimeEvent(thread->loop, msec_to_wait, delay_request, c, NULL); to wait before sending a request if "its time has not yet come". Because of the 1msec resolution of the ae async framework's aeTimeEvent and aeCreateTimeEvent, this can end up "oversleeping" by up to a millisecond, which ends up looking like a server problem when it is actually a load generator problem.

And the approach of "forgiving up to 1msec" is not a good one, as such an approach would miss real issues. IMO it is better to report pessimistic (could be somewhat worse than reality) latency numbers than ones that are better than reality.

But modern *nix variants can deal with clocks at a much finer resolution than 1msec (with e.g. nanosleep(), and timerfd), and the events should really be using a much finer resolution (e.g. 10-20 usec would not be unreasonable).

The really cool code in ae.c and friends appear to have originated from redis, and have not been touched in "forever". I'd like to work to improve the basic aeTimeEvent in that framework to include microsecond resolution information, along with a configurable quantum for actual time event resolution chunking.

The approach I'd take would probably keep the current external APIs (e.g. aeCreateTimeEvent which takes a 1-msec-unit time parameter) and all the current fields in e.g. aeTimeEvent (including when_sec and when_ms), but add an additional when_usec field for the optional microseconds-within-the-millisecond amount (defaults to 0) that some APIs may supply. We would then add additional APIs for those who want finer resolution (e.g. aeCreateTimeEventUsec(), aeWaitUsec(), aeGetTimeUsec()). We would change the underlying implementations that currently populate and use struct timevals (like aeProcessEvents(), aeApiPoll()), which already supports microsecond resolution, to correctly populate and usec-resolution information, and will use a timerfd to support sub-millisecond-resolution timing in epoll_wait() rather than rely on the timeout parameter.

The benefit of all this will be much more timely wakeups for delayed requests and a less pessimistic reporting of sub-millisecond response time levels, and better per-thread handling when >1000/sec/thread requests rates are actually possible.

rfonseca commented 2 years ago

Is this the reason for the comment "It is important to note that in wrk2's current constant-throughput implementation, measured latencies are [only] accurate to a +/- ~1 msec granularity, due to OS sleep time behavior." or is this a separate issue? Does the imprecision only occur when the rate is > 1000/sec/thread?