Result Widget has confusing number of virtual users

mcawilcox commented 2 months ago

Describe the bug After a test run, the results include an image of the main test parameters from a cloudwatch widget (for me, related to just region eu-west-2). In a calibration run, setting the concurrency to 10, I expect to see this in the metrics as a nice steady line of 10, preceeded by a steady ramp. Instead I get a line that jumps around, but is of the order of 200.

The logfile from Taurus consistently logs "10 vu" after the initial ramp up, but the logging interval varies from 5s down to 2s.

If I examine cloudwatch directly, I can simulate the view presented amongst the results, when the "virtual users" statistic is set to Sum. I can get the correct graph by changing the statistic to "Average", "Minimum" or "Maximum".

"Sum" is the wrong statistic to use for VU, as there are multiple samples per minute. It is correct for the "Successes" and "Failures" counts.

However, once corrected, the "Virtual Users" count does not have the right size/scale to be properly visible using the right-hand y-axis (much smaller value than "Successes"; I suggest that the result be made into two graphs - but I'm not sure if cloudwatch can generate a single widget in this manner.

I suggest a fix around line 403 in results-parser/lib/parser/index.js from:

key !== "avgRt" && (metricOptions[key].stat = "Sum");

to:

key !== "avgRt" && key !== "numVu" && (metricOptions[key].stat = "Sum");

but I'm not able to test, and I'm not sure of the impact to the other image, which brings me to the final point...

I am testing with a single region, and can only see the results image for that region. I can see that DLT has generated a "total" image as well, and I can see that the code changes some of the metric calculations ... but I can't get the DLT web GUI to display that "total" image.

To Reproduce

Deploy DLT using CloudFormation
Configure a test with a task count of 1, concurrency of 10, region of eu-west-2, ramp of 5, hold of 90, and a task type of Jmeter.
Upload Jmeter script where there is a single thread group, with number of threads 1, ramp up 1 and loop count 1. My test script happens to generate 23,000 requests in the 95 minutes (240 requests/min)
Let test run
Observe the Test Result page, especially the image on the lower-right of the panel.

Expected behavior

Please complete the following information about the solution:

[x] Version: v3.2.10
[x] Region: eu-west-2
[x] Was the solution modified from the version published on this repository? No
[x] If the answer to the previous question was yes, are the changes available on GitHub? n/a
[x] Have you checked your service quotas for the services this solution uses? No, but only running single calibration tests currently
[x] Were there any errors in the CloudWatch Logs? No

Screenshots

Original widget image, with virtual users in blue
The result-parser lambda logs a widget description into the logfiles, so I used this to create a cloudwatch widget. There is nothing visible because it sets the period to 10s
So I changed the period to 1 minute. Note the statistic for "Virtual Users" is set to Sum, and this graph matches the original.
Sum is bad to use when the samples aren't once per minute. This shows the number of samples
Here, I fixed the statistic, but it is now hard to see as it is the wrong scale for the y-axis
In this graph, I make the "Virtual Users" value more visible by multiplying by 10, but that value depends on the details of the test case.
Better would be to display the Users on a third y-axis, or like this as a separate graph with y-axis labelled for users
and a graph with y-axis labelled for Requests per Minute

Additional context

mcawilcox commented 2 months ago

I've added a snippet from the Taurus logs: 09-TaurusLogSnippet

mcawilcox commented 2 months ago

I found an example in Taurus of them separating the two result graphs for Hits and Response Times Taurus Reporting Example

I've made some changes to my widgets to emulate these two graphs:

I've used a stacked area graph for this one:
I've used a line graph for this:

mcawilcox commented 2 months ago

Inspired by that page, can I suggest an enhancement? Sometimes showing the average latency is good, but sometimes seeing p90 would be good:
Or p95:
If the enhancement can't be done on the static Test Results page, can a customisable widget be left in Cloudwatch (perhaps part of the dashboard) that would allow some of these extra lines to be graphed, allowing us to generate our own images for test reports?

mcawilcox commented 2 months ago

Addition: I realised the Cloudwatch live dashboard has the same underlying issue - it performs sum(@numVu) too - but this one mostly works because the full log insights parser is "stat sum(@numVu) by bin(1s)" ... and most of the time the bin(1s) ensures only a single sample matches, so sum()==avg().

I do see occasional glitches where the graph doubles ... so sometimes there are 2 samples per second. 10-CW-Live-UsersGlitch

Again, using avg(), min() or max() works.

kamyarz-aws commented 2 months ago

This is very comprehensive. Thanks for the analysis. I will go over it and update you on this.

mcawilcox commented 1 month ago

Addition: I did all my original analysis using a single load engine, which meant that use of the "Average" statistic worked out well. Since then, I started to scale my tests beyond a single load engine, and realised that the "Average" statistic no longer works - there needs to be something that knows how many engines are running in parallel.

As a quick hack in my own metrics, I added a line for "engines" as "TIME_SERIES(4)" when I have 4 tasks, and then defined the Virtual Users to be "AVG([numVu0]) * engines"

aws-solutions / distributed-load-testing-on-aws

Result Widget has confusing number of virtual users #206