GoogleCloudPlatform / prometheus-engine

Google Cloud Managed Service for Prometheus libraries and manifests.
https://g.co/cloud/managedprometheus
Apache License 2.0
193 stars 90 forks source link

OperatorConfig management #388

Open shpml opened 1 year ago

shpml commented 1 year ago

Is it possible to have a better way to manage OperatorConfig? Possibly with a config map or a custom resource similar to a PodMonitoring configuration.

Manually editing the yaml as suggested by the documentation isn't ideal.

Currently I apply changes to OperatorConfig across my environments using kubectl patch with terraform but this method isn't great either. I recently had an issue with too many metrics being scraped as I had enabled kubeletScraping for testing, after I removed the kubeletScraping config from my local yaml file I apply with kubectl patch it did not remove the config itself from kubernetes.

bjakubski commented 1 year ago

yeah, can't manage it with TF (without hacks to patch), can't use helm (does not want to modify resources it does not own). I'm trying (also hackish) solution to use kubernetes_labels/annotation TF resources to make helm "own" it and then use helm chart to manage it, but it is also not great...

pintohutch commented 1 year ago

Hi @shpml and @bjakubski,

I recently had an issue with too many metrics being scraped as I had enabled kubeletScraping for testing, after I removed the kubeletScraping config from my local yaml file I apply with kubectl patch it did not remove the config itself from kubernetes.

Are you trying to remove the entire OperatorConfig using patch? I'm not sure I follow.

With regards to managing the OperatorConfig, you should be able to update it and source-control it the same way as you would a PodMonitoring, no? It's just another custom resource watched by the operator. The main difference is that it's a singleton in a fixed namespace in the cluster. So, in theory, there should be less of them to manage than PodMonitorings.

But maybe I'm misunderstanding your question 🙂

shpml commented 1 year ago

Hi @pintohutch

I patch the OperatorConfig to limit the metrics I want to send to Cloud Monitoring. I do this using kubectl patch operatorconfig/config --namespace gmp-public --type merge --patch-file ${path.module}/templates/operator_config_patch.yaml. This command is run by terraform using local-exec.

collection:
  filter:
    matchOneOf:
      - '{__name__=~"puma_.+"}'
      - '{__name__=~"action_cable_.+"}'
      - '{__name__=~"sidekiq_.+"}'
      - '{__name__=~"k8s_app:.+"}'

I can't really manage this in terraform any other way as the resource is created automatically when managed prometheus is enabled on the cluster. Managed prometheus is enabled with the command gcloud container clusters update --enable-managed-prometheus for now but I will migrate to using the terraform config now that this issue is closed but it doesn't fix my problem.

If I try manage the OperatorConfig with terraform using the below kubernetes_manifest I get an error as the OperatorConfigresource is already created. I assume managed prometheus only works with an OperatorConfig named "config". If not I can just deploy an OperatorConfig with a different name.

resource "kubernetes_manifest" "operatorconfig_gmp_public_config" {
  manifest = {
    "apiVersion" = "monitoring.googleapis.com/v1"
    "collection" = {
      "filter" = {
        "matchOneOf" = [
          "{__name__=~\"puma_.+\"}",
          "{__name__=~\"action_cable_.+\"}",
          "{__name__=~\"sidekiq_.+\"}",
          "{__name__=~\"k8s_app:.+\"}",
        ]
      }
    }
    "kind" = "OperatorConfig"
    "metadata" = {
      "labels" = {
        "addonmanager.kubernetes.io/mode" = "Reconcile"
        "deployed-by"                     = "terraform"
      }
      "name"      = "config"
      "namespace" = "gmp-public"
    }
  }
}

With the PodMonitoring config I can deploy as many configurations as I like using the below kubernetes_manifest in terraform as an example.

resource "kubernetes_manifest" "prom_app_scaping" {
  manifest = {
    apiVersion = "monitoring.googleapis.com/v1alpha1"
    kind       = "PodMonitoring"
    metadata = {
      name      = "${var.common_labels.env}-prom-scaper"
      namespace = "default"
      labels = {
        "app.kubernetes.io/name" = "${var.common_labels.env}-prom-scaper"
      }

    }
    spec = {
      endpoints = [
        {
          interval = var.scrape_interval
          path     = "/metrics"
          port     = 3000
          scheme   = "http"
        }
      ]
      selector = {
        matchLabels = {
          prometheus-scrape = "true"
        }
      }
    }
  }
}

I hope this makes sense. I could wait for the OperatorConfig to be created and then import it into terraform but that workaround is also not ideal and doesn't scale well across multiple projects.

bjakubski commented 1 year ago

Hi @pintohutch

@shpml described the issue nicely. I'll only add that it is not easily possible to manage OperatorConfig with helm too due to the same issue - this object is created by GKE/operator it is "foreign" to helm and it will refuse to update it.

pintohutch commented 1 year ago

Hi @shpml and @bjakubski,

Apologies for the delayed response. I'm just returning back from leave over the holidays.

Thanks for making this use-case more clear and I think I see the issue.

I do see an open issue for the Kubernetes Terraform provider to support kubectl patch that has quite a lot of activity. It's unclear if the patch functionality would be supported for all resource types or not though (i.e. custom resources).

I assume managed prometheus only works with an OperatorConfig named "config". If not I can just deploy an OperatorConfig with a different name.

Yes that is correct. The "config" OperatorConfig is created by the GKE control plane as a singleton resource in the cluster and is referred by name in-source.

The workarounds you've mentioned are probably the best options for now. The only other solution would be to deploy managed collection yourself through the install manifests (i.e. not through the GKE API) using the kubernetes_mainfest for those.

There may be a path forward in a future release, but we'd need to evaluate. Do you have a proposal for how you think this would look like using a ConfigMap?

Either way, we can keep this issue open for consideration.

shpml commented 1 year ago

A proposal for how it would work with a ConfigMap would be for the operator to look for a ConfigMap named config or something more specific like gmp-opertor-config, or via a predefined label on a ConfigMap, app.kubernetes.io/gmp-opertor-config=True. The user can add configuration options here that the operator can merge into it's OperatorConfig

apiVersion: v1
kind: ConfigMap
metadata:
  name: gmp-opertor-config
  namespace: gmp-public
data:
  config.yaml: |
    collection:
      filter:
        matchOneOf:
        - '{__name__=~"puma_.+"}'
        - '{__name__=~"action_cable_.+"}'
        - '{__name__=~"sidekiq_.+"}'
        - '{__name__=~"k8s_app:.+"}'

Alternatively use operatorconfigs that a user could deploy to configure the operator. The operator can merge these together to use for it's configuration.

apiVersion: monitoring.googleapis.com/v1
kind: OperatorConfig
metadata:
  name: gmp-opertor-config
  namespace: gmp-public
collection:
  filter:
    matchOneOf:
    - '{__name__=~"puma_.+"}'
    - '{__name__=~"action_cable_.+"}'
    - '{__name__=~"sidekiq_.+"}'
    - '{__name__=~"k8s_app:.+"}'
pintohutch commented 1 year ago

Gotcha - thanks for the suggestions! We'll consider this and maybe some other options to better support this use case.

Btw, in the meantime, I wonder if using GKE ConfigSync could be an approach to keeping your OperatorConfigs in sync across clusters.

We haven't experimented much with it, but may be worth trying.

krishnaindani commented 6 months ago

Hi @pintohutch

I patch the OperatorConfig to limit the metrics I want to send to Cloud Monitoring. I do this using kubectl patch operatorconfig/config --namespace gmp-public --type merge --patch-file ${path.module}/templates/operator_config_patch.yaml. This command is run by terraform using local-exec.

collection:
  filter:
    matchOneOf:
      - '{__name__=~"puma_.+"}'
      - '{__name__=~"action_cable_.+"}'
      - '{__name__=~"sidekiq_.+"}'
      - '{__name__=~"k8s_app:.+"}'

I can't really manage this in terraform any other way as the resource is created automatically when managed prometheus is enabled on the cluster. Managed prometheus is enabled with the command gcloud container clusters update --enable-managed-prometheus for now but I will migrate to using the terraform config now that this issue is closed but it doesn't fix my problem.

If I try manage the OperatorConfig with terraform using the below kubernetes_manifest I get an error as the OperatorConfigresource is already created. I assume managed prometheus only works with an OperatorConfig named "config". If not I can just deploy an OperatorConfig with a different name.

resource "kubernetes_manifest" "operatorconfig_gmp_public_config" {
  manifest = {
    "apiVersion" = "monitoring.googleapis.com/v1"
    "collection" = {
      "filter" = {
        "matchOneOf" = [
          "{__name__=~\"puma_.+\"}",
          "{__name__=~\"action_cable_.+\"}",
          "{__name__=~\"sidekiq_.+\"}",
          "{__name__=~\"k8s_app:.+\"}",
        ]
      }
    }
    "kind" = "OperatorConfig"
    "metadata" = {
      "labels" = {
        "addonmanager.kubernetes.io/mode" = "Reconcile"
        "deployed-by"                     = "terraform"
      }
      "name"      = "config"
      "namespace" = "gmp-public"
    }
  }
}

With the PodMonitoring config I can deploy as many configurations as I like using the below kubernetes_manifest in terraform as an example.

resource "kubernetes_manifest" "prom_app_scaping" {
  manifest = {
    apiVersion = "monitoring.googleapis.com/v1alpha1"
    kind       = "PodMonitoring"
    metadata = {
      name      = "${var.common_labels.env}-prom-scaper"
      namespace = "default"
      labels = {
        "app.kubernetes.io/name" = "${var.common_labels.env}-prom-scaper"
      }

    }
    spec = {
      endpoints = [
        {
          interval = var.scrape_interval
          path     = "/metrics"
          port     = 3000
          scheme   = "http"
        }
      ]
      selector = {
        matchLabels = {
          prometheus-scrape = "true"
        }
      }
    }
  }
}

I hope this makes sense. I could wait for the OperatorConfig to be created and then import it into terraform but that workaround is also not ideal and doesn't scale well across multiple projects.

@shpml would you be able to share the example of managing operatorconfig with terraform?

shpml commented 6 months ago

@shpml would you be able to share the example of managing operatorconfig with terraform?

Operator Confg

# filename="operator_config_patch.yaml"
collection:
  filter:
    matchOneOf:
      - '{__name__=~"puma_.+"}'
      - '{__name__=~"action_cable_.+"}'
      - '{__name__=~"sidekiq_.+"}'
      - '{__name__=~"k8s_app:.+"}'

Terraform null resource.

# Patch the operator config deployed by GCP to specify metrics to collect.
# We patch instead of managing the resource as it's deployed by GCP and likely to be updated.
# A patch only adds, it does not remove
resource "null_resource" "patch_operator_config" {
  triggers = {
    yaml_update = "${filesha512("${path.module}/templates/operator_config_patch.yaml")}"
    # To force null_resource recreation un-comment below
    # always_run = timestamp()
  }
  provisioner "local-exec" {
    command = <<EOF
# Authenticate to cluster
gcloud container clusters get-credentials $CLUSTER_NAME --region $REGION --project $PROJECT

# Patch operatorconfig/config
kubectl patch operatorconfig/config --namespace gmp-public --type merge --patch-file ${path.module}/templates/operator_config_patch.yaml
EOF
    environment = {
      # used to by gcloud to auth with correct cluster
      CLUSTER_NAME = var.cluster_name
      PROJECT = var.project_id
      REGION  = var.region
    }
  }
}
krishnaindani commented 6 months ago

Have you faced any side effects or issues with this? One of the thing I am noticing is if I remove the entire patch config it does not remove it(this is just I am trying locally). Have you thought about deleting entirely the operator config and managing it? @shpml