apache / incubator-kie-kogito-serverless-operator

Kubernetes operator for SonataFlow
Apache License 2.0
14 stars 40 forks source link

EPIC - Add the possibility to integrate the Operator with Prometheus and Grafana #461

Open ricardozanini opened 2 months ago

ricardozanini commented 2 months ago

Description

Placeholder EPIC for issues related to adding monitoring support to the SonataFlow Operator.

Relates to:

Issues

JudeNiroshan commented 2 months ago

As a sonataflow operator user, I should be able to enable a monitoring flag via the SonataFlowPlatform CR, which will result in creating the required software components.

apiVersion: sonataflow.org/v1alpha08
kind: SonataFlowPlatform
spec:
  services:
    ...
    monitoring:
      enabled: true
  1. The Sonataflow operator should install the Prometheus & Grafana operators in the OCP/K8s cluster.
  2. Once the Prometheus and Grafana are in place, the operator should create a connection between Prometheus & Grafana via the GrafanaDataSource CR in Grafana. https://grafana.com/docs/grafana-cloud/developer-resources/infrastructure-as-code/grafana-operator/operator-dashboards-folders-datasources/#add-a-data-source
  3. Then create a ServiceMonitor object that can capture/collect metrics from all the deployed Serverless Workflows.
  4. Finally as the operator user, I expect to see a default Grafana Dashboard.
ricardozanini commented 2 months ago

@JudeNiroshan can you formulate on this request?

The Sonataflow operator should install the Prometheus & Grafana operators in the OCP/K8s cluster.

The operator won't be responsible for installing third-party operators in a cluster. The reason is that we won't add administrative permissions to the operator such as installing CRDs. Also, installing an operator comes with many configuration options. So it's highly complex to add an interface and wrappers around these installation procedures that can change from time to time when a new operator version is released.

Once the Prometheus and Grafana are in place, the operator should create a connection between Prometheus & Grafana via the GrafanaDataSource CR in Grafana

This is fine. We can certainly try to check if these CRDs are available in the cluster and create CRs to bind Prometheus and Grafana to deployed workflows.

Then create a ServiceMonitor object that can capture/collect metrics from all the deployed Serverless Workflows.

I'll break it down into Grafana and Prometheus integration, so we can verify those PRs separately.

Finally as the operator user, I expect to see a default Grafana Dashboard.

Can you create this dashboard and share it with me? So we can maintain and keep it in this repo. Feel free to send a follow up PR to the implemented feature updating the one I'll use as a placeholder.

ricardozanini commented 1 month ago

@JudeNiroshan one more thing regarding Grafana Data Sources. Please see: https://grafana.github.io/grafana-operator/docs/api/#grafanadatasourcespecdatasource

Looks like a data source requires credentials to access Prometheus. We can deploy the DS using the well-known credentials for a simple Prometheus installation, but in production environments, I don't think we can rely on this.

In this case, we can accept a secret containing the Prometheus credentials or use the well-known if empty.

JudeNiroshan commented 1 month ago

The reason is that we won't add administrative permissions to the operator such as installing CRDs. Also, installing an operator comes with many configuration options

Understood. Let's keep the installation outside the sonataflow operator.(e.g. in a helm chart)

Can you create this dashboard and share it with me? So we can maintain and keep it in this repo.

Sure.

we can accept a secret containing the Prometheus credentials or use the well-known if empty.

Agreed.

@ricardozanini Will this feature be considered for the next immediate sonataflow release(13th June 2024)?

ricardozanini commented 1 month ago

@JudeNiroshan I'm afraid not. Also, we already cut upstream already for the operator. This one should be on Apache KIE 11.x.