Open berwynhoyt opened 1 year ago
I'm also evaluating which load tool to use, so I'm glad to have come across your write-up and finding. Just wondering about 2 points:
It's important to note that wrk2 extends the initial calibration period to 10 seconds (from wrk's 0.5 second), so runs shorter than 10-20 seconds may not present useful information
I checked both wrk2 and wrk's documentation and couldn't seem to find what the calibration is for though.
Good questions.
Re (1), there is no reason to use so many connections except that the bogus results became most apparent when I did. Note that I found the most reliable results when I set #threads == #connections. My own project found that between 10 and 40 produced the maximum number of requests.
Re (2), I did not try that same test with a 10s period. I will do so now, on your prompting:
wrk2/wrk -d10 -c10 -t10 -R 10000000 "http://localhost:8085/multiply?a=2&b=3"
I get much more reasonable results, though they still range between 200,000 and 500,000 requests, which is 2 to 4 times what I get with any other tool, so I think they're still not correct.
In that last 10-second test, the problems still seems to be the time it thinks it took to finish, which ranges from 3 to 10s (when it actually took 10s).
Not sure if this topic is still of interest... recently I had some time to read the source code, I think below might be an explanation:
). The 90th percentile latency received during that period is used to determine the sampling interval used to collect data for the summary stats. (
long double interval = MAX(latency * 2, 10)`)So in the case of using 1000 connections, the test duration should ideally be > 15 sec, otherwise it would still in the middle of the calibration period. I found it's more reliable to test with a duration of 60s.
Searching wrk's issue discussion, it seems wrk used to have this calibration period too, but then it was removed around 2018 (https://github.com/wg/wrk/issues/280#issuecomment-359228266) . If I find some time I'll try to remove it in my local build and see if it allows running with short test duration.
If you are able to improve this, that would be FAB!
As you can see in my write-up here, wrk2 can produce bad results under certain conditions. For example:
See how I specified 250 threads, and a 5s test? Well, it did create 833071 requests in 5s, but as you see, it thinks it did it in 1.25ms, producing a ridiculous figure of 663 million requests/sec.
It doesn't always think it's finished in milliseconds. Sometimes it is more like 1s and other times closer to 5.
You can check out my repository that uses wrk2 here if you want to reproduce the bug.