Questions about the request timeouts used for benchmarking

hamiltont commented 10 years ago

According to #855, Wrk associates each request with a time limit, such as 5 seconds. If a request returns after the time limit, it is counted as an error. This was apparently a problem for benchmarking frameworks under heavy load conditions, because once the request queue was too full all requests were being counted as failures by wrk, as all requests were waiting in the queue more than 5 seconds while previous requests were being processed.

This would be hugely frustrating, because the longer this heavy load benchmark runs the worse your server appears to perform. This is true even if it was doing as much work is possibly with the current CPU--once your queue is in a very full state you are done. Apparently, there's no way to saturate+back-off+repeat, but only "hit the server as hard and fast as you can."

The solution to this problem was to make the timeout wrk uses for each request equal to the concurrency. That means that at concurrency level 256, each request that completes within 256 seconds! is counted as a successful request. That's over 4 minutes allowed for a single request!! The code looks like this:

{wrk} {headers} -d {duration} -c {max_concurrency} --timeout {max_concurrency} <snip>

I wanted to open an issue to discuss problems and potential different approaches.

Problems:

I would expect to see a lot of requests timing out on huge concurrency levels. The current approach may have the confusing effect of reducing total request timeouts as concurrency increases
If a framework intentionally refuses to worsen an already bad situation, it would be penalized for that while non-intelligent solutions that provide horrid quality of service would not. For example, if a webserver knows it could not service a newly arrived request within the next 2 minutes, it may refuse that request. Any refusal error code would be counted as an error, whereas just queuing might eventually be counted as a normal request-response.
To rephrase the above point, the current approach makes an inherent assumption about how the queue organizes, prioritizes, and handles requests. It also seems to assume that the queue should be large enough to hold all the requests (because holding them all and responding slowly counts as ok, whereas instant rejections count as errors). If a framework does this differently in some manner they think is more intelligent, they are probably being penalized for it at the moment.

Solution thoughts:

The timeouts can become insane (4 minutes!). Perhaps they should just be adjusted to more real-world appropriate levels, like max 120 seconds?
Perhaps just use a reasonable static timeout, like 30sec? This would let all frameworks optimize for what we consider acceptable quality of service. Basically we would be saying if you can't handle a request in X seconds just drop it. This would then allow each framework to configure items like queue length, prioritization, etc. If a framework enters the complete decimation mode discussed above it would just take the hit that thousands of requests timed out. This seems fairly real-world to me, as you would just reconfigure the server to start dropping requests (if there was no option to load balance)

msmith-techempower commented 10 years ago

From my point of view, the current implementation is fine.

Essentially, if a site using FrameworkX is under high load (getting /.d for example) and I am a user trying to visit this site, if I get anything more than a few seconds of waiting, I am going to cancel my request and try again immediately a few times, then give up for a very good while.

In no conceivable universe could I see myself waiting 120 seconds for a site to load.

As a developer of web applications, I think that 5 seconds is probably still a little bit long when considered production grade deployments. In my experience, if a user has to wait more than 1 second, it is considered a poor experience.

To be fair, we are doing a stress test with these benchmarks. We are expecting that highly optimized stacks will perform better under these extreme load conditions while lesser performant ones will suffer. To that end, I think that this discussion should be shifted away from arguing magnitude (how do you get every request to return without timing out) and more toward the most appropriate way of measuring what we believe we are measuring.

hamiltont commented 10 years ago

the current implementation is fine...no conceivable universe could I see myself waiting 120 seconds

The current implementation waits this long, so how is that fine?

msmith-techempower commented 10 years ago

The current implementation waits this long, so how is that fine?

Clearly, I have misread this post. Chalking it up to "it's early" and tagging @bhauer in.

hamiltont commented 10 years ago

Ah ok. That explains why I was really confused by your reply!!

EDIT: I looked back and realized the "problem" was mired in a wall of text, so I bolded it

bhauer commented 10 years ago

My opinion is that we should set a "reasonable" timeout such as 5, 10, 15, or 30 seconds.

However, I don't believe we're doing anything with the timeout information presently. As I understand it from @wg, requests that exceed the timeout duration are simply indicated as such by Wrk, but these requests are not terminated. Ultimately, when they finally do complete, Wrk will include them as a completed (successful) request.

As such, we're not treating timeouts as if they were errors.

hamiltont commented 10 years ago

but these requests are not terminated

Huh, that seems to be true. It appears from a quick look that wrk just examines the open connections once every --timeout seconds and increments the error count for any connections open longer than this. I was not expecting that :-P

Anyways, there are two thoughts from this. We either want to always set the timeout very high, so that we don't loose any performance from wrk iterating over the open connections frequently, or we want to set the timeout to a reasonable value so that it's at least displaying some useful information in the detailed log output. At the moment this variable timeout approach does not make much sense to me

I'm for changing this to a reasonable value as well, e.g. 5 15 or 30

methane commented 10 years ago

This is loadtest vs benchmarking (throughput) issue.

If it's a loadtest, requests should be sent on constant rate. When load is too high, server can send 503 error soon, and client doesn't send next request soon.

But current wrk sends next request soon just after receiving 503 error. Server resource is consumed for making tons of errors. It's because wrk is designed for benchmarking throughput.

If you want small (realistic) timeout, how about adding sleep after receiving 50x response? Sleep time should be same as timeout, I think.

methane commented 10 years ago

but these requests are not terminated

Huh, that seems to be true. It appears from a quick look that wrk just examines the open connections once every --timeout seconds and increments the error count for any connections open longer than this. I was not expecting that :-P

Uh, I was wrong. Sorry. According to this behavior, I agree with using shorter (3 or 5 secs) timeout.

msmith-techempower commented 8 years ago

Still not sure where we stand on this one - kicking down the road.

msmith-techempower commented 6 years ago

Here we are two years later, and I'm still not sure where we stand on this one.

It LOOKS like our timeout is now set to 8 across the board, so I am going to close this and if anyone disagrees, feel free to reopen.

TechEmpower / FrameworkBenchmarks

Questions about the request timeouts used for benchmarking #1052