aws-samples / distributed-load-testing-using-aws-fargate

Solution to setup AWS Fargate to run multi-region distributed performance testing.
Other
83 stars 28 forks source link

Multiple tasks make logging and metrics tricky #6

Open GregTurner opened 5 years ago

GregTurner commented 5 years ago

Great template, got up and started quickly with no fuss.

One question I have tho, it's about the metrics. If you create CloudWatch Metrics for Number of Concurrent Users, that value is only an average of all LoadTestRunner Tasks. For example, if I have 150 concurrent users, and three LoadTestRunner Tasks are created by default, that is actually 450 users. Furthermore, the average response time is somewhat inaccurate because that's an average of an average.

Is there any way to consolidate logging between Tasks, or a way to group the metrics together?

ferdingler commented 5 years ago

Hi @GregTurner, first of all thanks for trying this project!

The way Taurus prints the response times is based on average and it's not very granular, It aggregates the requests for a given second and then prints the average of it, which is the one being captured as a metric in CloudWatch. However, at the end of the test execution, taurus will print the Summary and show the response times based on percentiles (p50, p90, etc) which is a lot better than looking at the averages, you should see something like this in the CloudWatch Logs of each container log stream:

+---------------+---------------+
| Percentile, % | Resp. Times |
+---------------+---------------+
| 0.0 | 0.1 |
| 50.0 | 0.405 |
| 90.0 | 0.696 |
| 95.0 | 0.797 |
| 99.0 | 1.001 |
| 99.9 | 1.805 |
| 100.0 | 7.692 |

I have posted a question in the Taurus Forums to see if we can get more granular response times while the tests are being executed. This is definitely an area I want to improve on this project.

However, we need to keep in mind that the most important thing when doing Performance Load Testing is to evaluate the behavior of your System Under Test. It's important to monitor and have metrics around the load tests, but don't lose focus on what actually matters, which is to monitor your service itself. You should be learning how it responds, where are the bottlenecks, how does it scale, etc.