DescartesResearch / TeaStore

A micro-service reference test application for model extraction, cloud management, energy efficiency, power prediction, single- and multi-tier auto-scaling
https://se.informatik.uni-wuerzburg.de
Apache License 2.0
122 stars 142 forks source link

Irritating results #180

Closed Angi2412 closed 3 years ago

Angi2412 commented 3 years ago

Hello,

I used the application to make some test runs with locust, but the resulting data makes not that much sense for me. Maybe you have an idea to explain that behaviour?

The shown data is from scaling the "WebUI" microservice and consists of seven runs (875 Datapoints). Each run consists of 125 different parameter variations. This means each pod resource (CPU limit, memory limit and the number of pods) has five expressions (5x5x5 = 125):

Now if I look at the correlation matrix, the number of pods does not seem correlated to the average response time? Correlation

And then when I, for example, look at the relationship between average response time and CPU limit, which are correlated negatively it makes no sense... Here the memory limit is set to its median: cpu_limi_average _response_time_median

These are just examples.. there are more irritating results...

SimonEismann commented 3 years ago

These results do indeed look weird :)

Before we can dive deeper into what might be going on, could you help me better understand your experiment setup?

As far as I understand, you have deployed all tea store services and now you are investigating the impact of different configuration parameters for the WebUI container (CPU, Memory, #pods) with 5 potential values for each parameter. Here you measured every possible configuration (125) seven times (--> 875 experiments)

Can you describe how each individual experiment is configured:

Angi2412 commented 3 years ago

Thanks again for the fast response.

SimonEismann commented 3 years ago

A few things of the top of my hat:

Overall, your setup looks pretty good, to be honest.

Angi2412 commented 3 years ago

Could there be any weird dependency on the other services that explains this behaviour?

Angi2412 commented 3 years ago

In a 3-minute experiment with minimum resources, locust counts 306 requests (3.1 per second), from which 117 failed (1.5 per second).

SimonEismann commented 3 years ago

Your experiment setup looks reasonable in terms of exp durations/load/cooldown times.

So my theory right now would be that the failed request are the reason that the reported response times look so weird, so let's try getting rid of those first. Here I would look into two things:

SimonEismann commented 3 years ago

In terms of expected requests your configuration can handle, I don't have any experience with your exact container sizes, but in this paper: https://doi.org/10.1145/3358960.3379124 we had resource requests of 420m, and limits of 2000m, a setup with 8 pods for each service was able to handle at least 900 requests/seconds. So my gut feeling would be 1.5 req/s seem okay for a 300m instance, but definitely somewhat on the lower side.

Angi2412 commented 3 years ago

I did a run with 27 (3 x 3 x 3) iterations and 50 users with a spawn rate of 1 per second. The parameter variations were as follows: CPU limit: 100m, 200m and 300m Memory limit: 500Mi, 600Mi and 700Mi Number of Pods: 1, 2 and 3

The resulting plots are still irritating. This image shows the relationship for each parameter with the target average response time. The not observed parameters are each set to minimum, median and maximum. Observations

The most occurring error codes are 500, 302 and 404. For example in the minimum variation they happened the following amount:

Error Code Amount
302 6
404 12
500 14
Angi2412 commented 3 years ago

I did now filter the average response time so that only requests with a response smaller than 300 are taken into account. The overall average response time is now lower, but the course of the plots are still the same except for the number of pods with maximum CPU and memory limit.

filtered status code

Angi2412 commented 3 years ago

Is there maybe any kind of dataset already available?

SimonEismann commented 3 years ago

So there still seem to be some things off, but the results already look better. I think to dive deeper here, we probably need some more details, here are some ideas on what might help:

If you want, we can also schedule a skype call to go over this (would need to switch to e-mail to exchange skype ids).

Angi2412 commented 3 years ago

A skype call would be great. My e-mail address is provided on my profile page.