felipegutierrez / explore-flink

This project uses Apache Flink as a stream engine that consumes data from the File system or Kafka brokers and exposes metrics using Prometheus and Grafana, everything deployed on Kubernetes (minikube).
44 stars 22 forks source link

Related papers #5

Closed felipegutierrez closed 3 years ago

felipegutierrez commented 5 years ago
felipegutierrez commented 5 years ago

Performance Modeling for Cloud Microservice Applications

The Micro Service Capacity (MSC) is defined by the maximal rate of request capacity which a microservice can handle while not violating the Service Level Objective (SLO). PROBLEM - The actual performance of cloud computing can differ fro mthe expected depending on the type of VM and the application running on it. In other words, The workload of the cloud biases the expected performance before deploying a VM or an application. OR -> there is no universal performance model for any cloud application. It sounds like he will use Microservice capacity (MSC), application bottleneck detection, and aplication capacity planning to create his performance model for Cloud Microservice Applications.

Kubernetes uses the CPU requests value (fraction of CPU time that the system has to guarantee to a container) to decide o nwhich node to place a pod. He defines "Multiple microservices instances per host" and "Single microservices instances per host" regarding the ability to guarantee quality of service for the microservice. Only the single approach can guarantee quality of service.

He determines the Microservice Capacity (MSC) based on the SLO of a dummy microservice (with dummy connections). I am not sure what is dummy connections for him. However, it acts like collecting a sample of the Kubernets cluster with a given workload. Then he uses Theil-Sen estimator to derive a regression model. Wikipedia => In non-parametric statistics, the Theil–Sen estimator is a method for robustly fitting a line to sample points in the plane (simple linear regression) by choosing the median of the slopes of all lines through pairs of points. It has also been called Sen's slope estimator,slope selection, the single median method, the Kendall robust line-fit method,and the Kendall–Theil robust line. It is named after Henri Theil and Pranab K. Sen, who published papers on this method in 1950 and 1968 respectively, and after Maurice Kendall because of its relation to the Kendall tau rank correlation coefficient. This estimator can be computed efficiently, and is insensitive to outliers. It can be significantly more accurate than non-robust simple linear regression (least squares) for skewed and heteroskedastic data, and competes well against least squares even for normally distributed data in terms of statistical power. It has been called "the most popular nonparametric technique for estimating a linear trend".

He is comparing CPU utilization with number of requests. Ideally, if the count of requests per second would have remained below 104 = 6200/60 then all the requests would have been completed without violating SLOs. Hence, MSC is approximately 104 requests per second.