3scale-ops / prometheus-exporter-operator

Operator to centralize the setup of 3rd party prometheus exporters on Kubernetes/OpenShift, with a collection of grafana dashboards
Apache License 2.0
42 stars 15 forks source link

Grafana dashboard API-group change #48

Closed davidkarlsen closed 1 year ago

davidkarlsen commented 1 year ago

It seems like the operator is using the old API-group:

--------------------------- Ansible Task StdOut -------------------------------

TASK [prometheusexporter : Manage GrafanaDashboard (if integreatly.org api-group exists) for PrometheusExporter http-probe on Namespace ftm-dev] ***
task path: /opt/ansible/roles/prometheusexporter/tasks/main.yml:36

while it is now:

k get crd|grep -i GrafanaDashboard
grafanadashboards.grafana.integreatly.org                         2023-08-09T22:31:04Z

Note the additional part "grafana".

davidkarlsen commented 1 year ago

/kind bug

slopezz commented 1 year ago

Hi @davidkarlsen thanks for reporting the bug.

I've just seen that there is an API change on grafana-operator v5.0.0 from June 2023 https://github.com/grafana-operator/grafana-operator/releases/tag/v5.0.0

According to the docs, it affects to prometheus-exporter-operator only in the API Version (from integreatly.org/v1alpha1 to grafana.integreatly.org/v1beta1) https://github.com/grafana-operator/grafana-operator/blob/master/docs/blog/v4-v5-migration.md

So in the meantime, as it is a breaking change, and it is really possible that not everybody have updated yet to newer grafana-operator v5, I will make optional to select which API version to use for the GrafanaDashboard CR, making maybe the default value the old API (which will be updated on future releases)., so giving time to people to use the new CRD value.

davidkarlsen commented 1 year ago

@slopezz That makes sense and sounds like a good solution.

slopezz commented 1 year ago

Hi @davidkarlsen , on https://github.com/3scale-ops/prometheus-exporter-operator/pull/49 I theoretically fixed the issue.

Aside from the apiVersion change, there is another change on the way grafana labelSelectors need to be added to the GrafanaDashboard, instead of the usual metada.label, there is a new CR field on the spec spec.instanceSelector.matchLabels.label.

I added an e2e test to cover the 2 possible GrafanaDashboard apiVersions, and generated an alpha release quay.io/3scale/prometheus-exporter-operator:v0.7.0-alpha.1, which adds a new CRD field grafanaDashboard.apiVersion (with possible values v1alpha1 or v1beta1)

I wonder how do you install the operator, and if it is easy for you to test this alpha release v0.7.0-alpha.1 before I generate a stable release?

Once operator has been updated to this alpha release, there is needed to add the new CRD field grafanaDashboard.apiVersion=v1beta1:

apiVersion: monitoring.3scale.net/v1alpha1
kind: PrometheusExporter
metadata:
  name: example2-memcached
  namespace: default
spec:
  type: memcached
  grafanaDashboard:
    label:
      key: autodiscovery
      value: enabled
    apiVersion: v1beta1   ### This one
  dbHost: your-memcached-host
  dbPort: 11211

If you cannot test this alpha release on your environment don't worry, just ping me and will generate the stable release v0.7.0.

davidkarlsen commented 1 year ago

I changed the operator image and applied the updated CRD, but now the operators complains about:

E0906 09:45:52.104841       7 leaderelection.go:330] error retrieving resource lock openshift-operators/prometheus-exporter-operator: leases.coordination.k8s.io "prometheus-exporter-operator" is forbidden: User "system:serviceaccount:openshift-operators:prometheus-exporter-operator-controller-manager" cannot get resource "leases" in API group "coordination.k8s.io" in the names

Did the lease mechanism change?

davidkarlsen commented 1 year ago

OK, hacked that in place, but operator also needs RBAC to the grafanadashboard resources:

e\\\",\\\"message\\\":\\\"grafanadashboards.grafana.integreatly.org \\\\\\\\\\\"prometheus-exporter-probe\\\\\\\\\\\" is forbidden: User \\\\\\\\\\\"system:serviceaccount:openshift-operators:prometheus-exporter-operator-controller-manager\\\\\\\\\\\" cannot get resource \\\\\\\\\\\"grafanadashboards\\\\\\\\\\\" in API group \\\\\\\\\\\"grafana.integreatly.org\\\\\\\\\\\" in the namespace \\\\\\\\\\\"ftm-dev\\\\\\\\\\\"\\\",\\\"reason\\\":\\\"Forbidden\\\",\\\"details\\\":{\\\"name\\\":\\\"prometheus-exporter-probe\\\",\\\"group\\\":\\\"grafana.integreatly.org\\\",\\\"kind\\\":\\\"grafanadashboards\\\"},\\\"code\\\":403}\\\\n'\", \"reason\": \"Forbidden\", \"status\": 403}\u001b[0m\n\r\nPLAY RECAP *********************************************************************\r\n\u001b[0;31mlocalhost\u001b[0m                  : \u001b[0;32mok=6   \u001b[0m \u001b[0;33mchanged=1   \u001b[0m unreachable=0    \u001b[0;31mfailed=1   \u001b[0m \u001b[0;36mskipped=3   \u001b[0m rescued=0    ignored=0   \r\n\n","job":"1999109018916564212","name":"http-probe","namespace":"ftm-dev","error":"exit status 2"}
{"level":"error","ts":1693993852.4912605,"msg":"Reconciler error","controller":"prometheusexporter-controller","object":{"name":"http-probe","namespace":"ftm-dev"},"namespace":"ftm-dev","name":"http-probe","reconcileID":"ae48e2d3-e7ed-4b34-9c26-ed64da51809d","error":"event runner on failed","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234"}
slopezz commented 1 year ago

Oh yes sorry @davidkarlsen , there is also a change in the rbac https://github.com/3scale-ops/prometheus-exporter-operator/commit/fc43dab4fa968a1f1036252370ae576cb190347d , but since you added the changes manually to test the alpha release (instead of using OLM), it has not been applied

I'm going to generate now an stable version, and the you will be able to upgrade your installation

slopezz commented 1 year ago

@davidkarlsen Stable release v0.7.0 is already available

Please, upgrade your installation with OLM (OLM handles the CRD, rbac and image required updates), or manually deployment (don't know how you are deploying the operator).

And feel free to re-open the issue if it does not work properly

davidkarlsen commented 1 year ago

@slopezz I use the opensource stream integrated with openshift: https://operatorhub.io/operator/prometheus-exporter-operator - do you upgrade this feed too?

davidkarlsen commented 1 year ago

Seems current version is 0.3.4

slopezz commented 1 year ago

Yep, I managed both openshift-operator-hub (latest v0.3.4) and k8s operator-hub (latest v0.2.4), but the versions there are pretty old.

I'm going to open a PR on every repo (openshift/k8s) in order to update the version to latest v0.7.0.

I will ping you once the PRs get merged so you can update your installation (which sometimes can take a few days).

davidkarlsen commented 1 year ago

Great. Thanks so much!

slopezz commented 1 year ago

Hi @davidkarlsen , finally OCP OperatorHub accepted the v0.7.0 at https://github.com/redhat-openshift-ecosystem/community-operators-prod/pull/3218, so you should be able to install it with OLM.

Cheers!

davidkarlsen commented 1 year ago

@slopezz Yes - just upgraded and got the dashboards in place - thanks a lot! :) 👍