Soon we are going to migrate current boxy layer in EC2 (sentinel + twemproxy) to OCP, and we need to have good serviceability around it:
Recenly we have added the option of canary deployments on multiple workloads, we will use this feature while we migrate boxy EC2 to sentinel+ twemproxy sidecars in OCP, having both current and new pods managing traffic at the same time
Recently a sentinel controller was added to saas-operator, so sentinel is deployed in OCP and need observability (metrics, dashboard, alerts)
Recently a twemproxy controller was added to saas-operator, so a twemproxy sidecar can be injected on selected deployments that neeed to connect to backend-storage (backend and system deployments), so we need observability around them (metrics, dashboard, alerts)
How?
Both sentinel and twemproxy has already dashboards and alerts scrapped from EC2 by the prometheus-ec2 deployment, but now we need to convert what we have to be deployed by operator, doing lot of required changes due to the difference in the way are deployed
So it is needed to:
Add canary dashboard to current backend dashboard (so we can see the difference in performance of both backend main and canary deployment receiving same but splitted production traffic). That way we can see if main or canary has better or worst performance before doing a full switcj to sentinel+twemproxy in OCP
Add sentinel dashboard. Right now the sentinel dashboard we have is very simple, only monitoring the last ping reply, but now we have lots of interesting sentinel-redis metrics, as well as pod cpu/mem... sentinel metrics. So a new cmplete dashboard need to be done
Add twemproxy dashboard. Right now twemproxy dashboard uses EC2 labels as dashboard vars, as well as EC2 cpu/mem from node_exporter. Right now twemproxy deployed as a sidecar has diffrent lables to separate between types of twemproxy type(backend/system), and we have pod metrics for cpu/mem (actually, container metrics from a pod with multiple containers)
Requirements
[x] Scrape sentinel metrics from the operator
[x] Scrape twemproxy metrics from sidecars
[x] Add backend canary metrics to current backend grafana dashboard
[x] Add sentinel grafana dashboard
[x] Add twemproxy grafana dashboard
[x] Convert legacy sentinel alerts to new alerts (maybe add new ones)
[x] Convert legacy twemproxy alerts to new alerts (maybe add new ones)
@slopezz: There are no kind on this issue.
Please add an appropriate kind by using one of the following commands:
/kind bug
/kind cleanup
/kind documentation
/kind feature
/kind <kind-name>
Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
Why?
Soon we are going to migrate current boxy layer in EC2 (sentinel + twemproxy) to OCP, and we need to have good serviceability around it:
How?
Both sentinel and twemproxy has already dashboards and alerts scrapped from EC2 by the prometheus-ec2 deployment, but now we need to convert what we have to be deployed by operator, doing lot of required changes due to the difference in the way are deployed
So it is needed to:
Requirements