giltene / wrk2

A constant throughput, correct latency recording variant of wrk
Apache License 2.0
4.27k stars 391 forks source link

Does `-c100 -R100` means "100 connections at 1 RPS", or "100 connections at random RPS" ? #73

Open Jonarod opened 5 years ago

Jonarod commented 5 years ago

I have a web app exposing an API which will be rate limited at 1 request per second per IP. That being said I need to benchmark the maximum simultaneous users my app is able to absorb while keeping a P99 latency under 700ms.

In my understanding, wrk -t4 -c10000 -d70 -R 10000 -L http://localhost:8080 would translate to: using 4 threads, create 10.000 connections requesting localhost:8080 at 10.000 requests/s ALL TOGETHER, during 60 seconds (70-10s for initialization).

Is my understanding correct ? or will wrk2 RE-use some connections to issue more than 1 RPS in some of them ? for example, on the 10000 connections opened, is it possible that only 1 would actually be sending 10000 RPS while the 9999 others would stay opened without requests or something similar ?

EDIT Just adding some illustration. Does this wrk -t4 -c3 -d70 -R 3 -L http://localhost:8080 means:

option 1:

           |<  1 second  >|
client #1  |  1 request   |
client #2  |  1 request   |
client #3  |  1 request   |

or

option 2:

           |<  1 second  >|
client #1  |  2 request   |
client #2  |              |
client #3  |  1 request   |
t-lo commented 5 years ago

Hello @jonarod,

Wrt. your first example, -t 4 -c 10000 will open 10k connections overall, in 4 threads (2500 connections per thread).

The third paragraph seems to imply a "connection" (pardon my english) between RPS and connections which in fact does not exist. Connections, once established, are being kept alive for the whole run. RPS are distributed evenly among connections, so your example would issue 1 RPS per connection, on average.

To emulate a load of 10k users, these settings may not suffice. Even though generating a relatively heavy load overall, the settings would imply that a single user would issue a meekly 1RPS. Assuming a single connection per user which is reused across requests (e.g. a well-behaving browser), you might want to consider bumping the RPS to a rate that reflects the load you'd expect per user, times 10,000.

Tuning wise, you also might want to consider bumping the thread count - have a look at the CPU load when wrk2 runs, and increase -t up to the number of CPUs in your load generator machine.

Lastly, the connections command line parameter sets the overall connection count, so it needs to be equal, or larger than, the number of threads. Therefore, your last example would cause an error message, as it uses -t > -c.

Hope this helps!

Jonarod commented 5 years ago

you might want to consider bumping the RPS to a rate that reflects the load you'd expect per user, times 10,000

So I understand that, in order to emulate a load of 10.000 simultaneous users issuing 1 request at the same time, the correct parameters should be:

wrk -t4 -c10000 -d70 -R 10000 -L http://localhost:8080

Thanks for your insights @t-lo

t-lo commented 5 years ago

Hello @jonarod,

Your statement is correct. As stated, I recommend increasing the number of threads (to match the number of cores in your load generator machine). My concern is that you will max out the 4 cores being used in your example (-t 4) - so you would be measuring the "latency" of your load generator's overloaded 4 cores instead of benchmarking your application.

Happy to help, Thilo

Fatahir commented 4 years ago

Hi @t-lo correct me if i am wrong. what i understand is that during the run, the number of connections and number of threads are constant. but what about the R. if i want that at every time, the number of requests per second increases by 500. can i change it ?

Triple-Z commented 3 years ago

So I understand that, in order to emulate a load of 10.000 simultaneous users issuing 1 request at the same time, the correct parameters should be:

wrk -t4 -c10000 -d70 -R 10000 -L http://localhost:8080

In this scenario, the first 10000 requests are sent at the beginning of the second and not continuous sending requests in the rest of the first second, do I understand this right?