Open cah-hbaum opened 10 months ago
Notes from prioritization meeting 06-10-23:
solutions like gardener deploy stuff like logging / monitoring
what would be best-practice and what is probably required?
optional for a user, if he wants to deploy something like this in a cluster
can be more seen like a research issue
I investigated how gardener handles monitoring. In short: Gardener provides an integrated logging and monitoring stack that utilizes a number of prometheus-operators in a federated fashion. (see: https://gardener.cloud/docs/gardener/extensions/logging-and-monitoring/ ) A more detailed description of this concept can be found here: https://github.com/gardener/gardener/blob/master/docs/monitoring/README.md#prometheus
The federated monitoring could also be a suitable approach for monitoring SCS k8s clusters. However, this depends on a certain combination of tools. The usage of those tools should not be mandatory, but could be an optional standard. To conclude, I would consider the following aspects of gardener to be suitable for a definition of a standard:
I also investigated how gardener handles logging. It very much depends on the use of certain tools. Compared to the monitoring setup, I would say that it cannot be used as a recommended reference implementation because it is too individual in its composition. However, I would recommend providing a storage setup for log data, as Kubernetes itself cleans up its container log data after some time.
In addition, Kubernetes does not offer “logging at cluster level” see. It is therefore recommended to use one of the various common approaches for “logging at cluster level” Perhaps we can provide an optional standardized use of "logging agents" on each node to ensure that an SCS k8s cluster provides a logging mechanism.
Note that scs-0403-v1-csp-kaas-observability-stack also covers part of the subject of this topic.
In contrast to gardener's approach, it relies on the use of Thanos.
This issue was created to provide a discussion ground for possible future standards. It is derived from #181 and on of the points not assigned any issue yet. The following topics/ideas should be discussed and maybe extended:
Definition of Done: