Support benchmarking of multiple HTTP(S) endpoints

t-lo commented 5 years ago

This PR adds support for specifying, and for benchmarking, multiple HTTP(S) endpoints in a single wrk2 run.

Our main motivation of running a benchmark over multiple endpoints is to allow benchmarking of e.g. a whole web application instead of the pages and/or restful resources that make up said application individually.

Most of the heavy lifting is done in a LUA script, multiple-endpoints.lua The script allows for specifying an arbitrary number of HTTP(S) endpoints to include in the benchmark. Endpoints will be connected to in a random, evenly distributed fashion. After a run finished, the overall latency will be reported (i.e. there's currently no break-down of latency per endpoint).

Furthermore, this PR introduces a change in wrk.c that will force a thread to reconnect (i.e. close socket / open socket using current value of wrk.thread.addr) each time wrk.thread.addr is set from a LUA script.

Lastly, the PR includes a patch by @janmejay to handle remote connection close. @dongsupark identified this issue during our testing.

Known Limitations Please note that currently, benchmarking multiple endpoints requires threads == connections, as we close & reconnect as soon as a thread assigns wrk.thread.addr, which impedes ongoing async requests. There are a number of ways to remove this limitation; and we are actively investigating. However, we'd like to start getting early feedback on our direction, hence moved to create this PR with a known limitation.

t-lo commented 5 years ago

Example usage:

./wrk -s scripts/multiple-endpoints.lua -L -R10000 -t 30 -c 30 -d 60 \ 
                http://app.my-service.io/api/job-endpoint.json \
                http://app.my-service.io/api/data.json \
                http://app.my-service.io/static/page.html \
                http://app2.my-other-service.io/api/exec.json \
                http://app2.my-other-service.io/static.html
[...]
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.99s     2.43s    8.77s    58.11%
    Req/Sec   292.07    123.44     1.60k    77.88%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    3.78s
 75.000%    6.06s
 90.000%    7.48s
 99.000%    8.43s
 99.900%    8.69s
 99.990%    8.77s
 99.999%    8.77s
100.000%    8.78s

  Detailed Percentile spectrum:
       Value   Percentile   TotalCount 1/(1-Percentile)
[...]
#[Mean    =     3991.360, StdDeviation   =     2426.005]
#[Max     =     8773.632, Total count    =       415581]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  515126 requests in 1.00m, 2.14GB read
Requests/sec:   8591.59
Transfer/sec:     36.58MB
Total Requests: 515126
HTTP errors: 0
Requests timed out: 0
Bytes received: 2300037590
Socket connect errors: 0
Socket read errors: 0
Socket write errors: 0

URL call count
http://app.my-service.io/api/job-endpoint.json  : 105330
http://app.my-service.io/api/data.json  : 104250
http://app.my-service.io/static/page.html : 99840 
http://app2.my-other-service.io/api/exec.json : 103200 
http://app2.my-other-service.io/static.html : 103290

giltene commented 5 years ago

Let's open an issue for this to discuss before we pull it in...

One of my main concerns is "forking" too far from the place we originally forked wrk at, which would make catching up with wrk itself harder. And since I (personally) have not really tracked how wrk has evolved from that point, I don't know how this PR relates to features there.

Has wrk added support benchmarking of multiple HTTP(S) endpoints?

t-lo commented 5 years ago

Happily opening an issue to discuss if that's the preferred path - I did not see much of a benefit over discussing right here, on the PR, so I did not cut an issue right away.

Regarding upstream, I'd argue that the feature introduced by this PR makes a lot more sense in the context of benchmarking with constant RPS - something upstream does not support. The main scenario we were aiming at when writing this code was to simulate constant RPS load on a cloud-native (i.e. clustered) web app (consisting of multiple micro-services with multiple URLs each), so basing our PR on wrk2 instead of wrk made more sense to us. To answer your question, no, I do not believe upstream currently has a comparable feature.

That said, I think I better understand the main concern of not diverging from upstream too much. Let me look into the latest upstream changes, with the goal of produing a PR to update this fork, before we continue discussing this PR.

giltene / wrk2

Support benchmarking of multiple HTTP(S) endpoints #76