SovereignCloudStack / issues

This repository is used for issues that are cross-repository or not bound to a specific repository.
https://github.com/orgs/SovereignCloudStack/projects/6
2 stars 1 forks source link

Requirements of the operators regarding KaaS Monitoring #299

Closed o-otte closed 1 year ago

o-otte commented 1 year ago

We need to know, what exactly an Operator needs to see in order to provide a stable, performant managed Kubernetes to their customers:

How do we want to gather the input from our operators?

The outcome should be an ADR on the requirements as a foundation of the following work on the corresponding epic.

Definition of Ready:

Definition of Done:

o-otte commented 1 year ago

Preliminary Survey Results:

  1. What is your understanding of a managed Kubernetes Offering:

    • Hassle-Free Installation and Maintainance (customer viewpoint); Providing Controlplane and worker nodes and responsibility for correct function but agnostic to workload
    • Day0, 1 and 2 (~planning, provisioning, operations) full lifecyle management or let customer manages some parts of that, depending on customer contract
  2. What Type and Depth of observability is needed

    • CPU, RAM, HDD and Network usage, Health and Function of Cluster Nodes, Controlplane and if desired Customer Workload
  3. Do you have an observabiltiy infrastructure, if yes, how it is built

  4. Data Must haves

    • CPU, RAM, Disk, Network
    • HTTP Connectivity Metrics
    • Control Plane and Pod metrics (States, Ready, etc.)
    • Workload specific metrics
    • node stats
    • k8s resources (exporters, kubestate metrics, cadvisor, parts of the kubelet)
    • ingress controller exporter (http error rate, cert metrics like expiration date)
    • k8s certs metrics
    • metrics of underlying node
    • Logs of control plane, kubelet and containerd
  5. Must Not haves

    • Secrets, otherwise as much as possible for anomaly detection over long time data
  6. Must have Alerts

  7. Must NOT Alert on

    • Should not wake people, nothing that does not lead to Action items
  8. Observability from Within Or Outside KaaS. How does the architecture look like?

    • Monitoring Infra on CSP Side
    • Data from Customer Clusters and Mon Infra on CSP and KaaS, get both data. KaaS Monitoring can also be used by customer
  9. Special Constraints

    • HA Setup in different Clusters on Different Sites