Closed hamiltont closed 6 years ago
From my point of view, the current implementation is fine.
Essentially, if a site using FrameworkX is under high load (getting /.d for example) and I am a user trying to visit this site, if I get anything more than a few seconds of waiting, I am going to cancel my request and try again immediately a few times, then give up for a very good while.
In no conceivable universe could I see myself waiting 120 seconds for a site to load.
As a developer of web applications, I think that 5 seconds is probably still a little bit long when considered production grade deployments. In my experience, if a user has to wait more than 1 second, it is considered a poor experience.
To be fair, we are doing a stress test with these benchmarks. We are expecting that highly optimized stacks will perform better under these extreme load conditions while lesser performant ones will suffer. To that end, I think that this discussion should be shifted away from arguing magnitude (how do you get every request to return without timing out) and more toward the most appropriate way of measuring what we believe we are measuring.
the current implementation is fine...no conceivable universe could I see myself waiting 120 seconds
The current implementation waits this long, so how is that fine?
The current implementation waits this long, so how is that fine?
Clearly, I have misread this post. Chalking it up to "it's early" and tagging @bhauer in.
Ah ok. That explains why I was really confused by your reply!!
EDIT: I looked back and realized the "problem" was mired in a wall of text, so I bolded it
My opinion is that we should set a "reasonable" timeout such as 5, 10, 15, or 30 seconds.
However, I don't believe we're doing anything with the timeout information presently. As I understand it from @wg, requests that exceed the timeout duration are simply indicated as such by Wrk, but these requests are not terminated. Ultimately, when they finally do complete, Wrk will include them as a completed (successful) request.
As such, we're not treating timeouts as if they were errors.
but these requests are not terminated
Huh, that seems to be true. It appears from a quick look that wrk
just examines the open connections once every --timeout
seconds and increments the error count for any connections open longer than this. I was not expecting that :-P
Anyways, there are two thoughts from this. We either want to always set the timeout very high, so that we don't loose any performance from wrk
iterating over the open connections frequently, or we want to set the timeout to a reasonable value so that it's at least displaying some useful information in the detailed log output. At the moment this variable timeout approach does not make much sense to me
I'm for changing this to a reasonable value as well, e.g. 5 15 or 30
This is loadtest vs benchmarking (throughput) issue.
If it's a loadtest, requests should be sent on constant rate. When load is too high, server can send 503 error soon, and client doesn't send next request soon.
But current wrk sends next request soon just after receiving 503 error. Server resource is consumed for making tons of errors. It's because wrk is designed for benchmarking throughput.
If you want small (realistic) timeout, how about adding sleep after receiving 50x response? Sleep time should be same as timeout, I think.
but these requests are not terminated
Huh, that seems to be true. It appears from a quick look that wrk just examines the open connections once every --timeout seconds and increments the error count for any connections open longer than this. I was not expecting that :-P
Uh, I was wrong. Sorry. According to this behavior, I agree with using shorter (3 or 5 secs) timeout.
Still not sure where we stand on this one - kicking down the road.
Here we are two years later, and I'm still not sure where we stand on this one.
It LOOKS like our timeout is now set to 8 across the board, so I am going to close this and if anyone disagrees, feel free to reopen.
According to #855, Wrk associates each request with a time limit, such as 5 seconds. If a request returns after the time limit, it is counted as an error. This was apparently a problem for benchmarking frameworks under heavy load conditions, because once the request queue was too full all requests were being counted as failures by wrk, as all requests were waiting in the queue more than 5 seconds while previous requests were being processed.
This would be hugely frustrating, because the longer this heavy load benchmark runs the worse your server appears to perform. This is true even if it was doing as much work is possibly with the current CPU--once your queue is in a very full state you are done. Apparently, there's no way to saturate+back-off+repeat, but only "hit the server as hard and fast as you can."
The solution to this problem was to make the timeout wrk uses for each request equal to the concurrency. That means that at concurrency level 256, each request that completes within 256 seconds! is counted as a successful request. That's over 4 minutes allowed for a single request!! The code looks like this:
I wanted to open an issue to discuss problems and potential different approaches.
Problems:
Solution thoughts: