DopplerHQ / kubernetes-operator

Apache License 2.0
44 stars 18 forks source link

Strange double-deployment when doing helm deploy #50

Closed SGudbrandsson closed 11 months ago

SGudbrandsson commented 11 months ago

Hi,

We've been facing some very strange issues in production lately in GKE Autopilot. When we deploy new revision of our application, the helm deploy action is running, then for some reason, Doppler triggers another deployment very soon after the helm deployment starts. We are using HPA to automatically scale to N pods and expect helm to just update the deployment, leaving the pod count as is.

When Doppler triggers a deployment immediately after the helm deployment starts, the workload scales down to 1 pod.

You can see the effect here: image image

We deploy multiple times per day, so this is causing a major headache in production.

We're using Doppler 1.2.5 currently, and I don't know if upgrading will do anything. Have you seen this behavior? What do you recommend?

nmanoogian commented 11 months ago

Hey @SGudbrandsson! Thanks for reaching out (and for your detailed report!)

Could you confirm a few things for me?

Even with a hypothetical helm reinstall of the Doppler Operator, the system is designed to check a hash of your Doppler secrets before redeploying any of your workloads. I'm certainly keen to get to the bottom of this one!

SGudbrandsson commented 11 months ago

Hi @nmanoogian

Are you using the Doppler Kubernetes Operator with automatic redeployments?

Yes. I tried disabling the automatic redeployment with an annotation to the workload, and the behavior stopped; only one deployment happens for those workloads with the disabled annotations.

Are there any interesting logs from the operator during the time that the Doppler Operator triggers redeployments for your app?

I guess. Here are the logs, pinned to the deployment workload image

Here's the JSON

[
  {
    "textPayload": "2023-09-19T09:55:59.354Z\tINFO\tcontrollers.DopplerSecret\tReconciling dopplersecret\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\"}",
    "insertId": "1s0ee3jma3v9qlyy",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "container_name": "manager",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f",
        "namespace_name": "doppler-operator-system",
        "cluster_name": "REDACTED-gke",
        "project_id": "REDACTED",
        "location": "europe-west2"
      }
    },
    "timestamp": "2023-09-19T09:55:59.354379466Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb",
      "k8s-pod/pod-template-hash": "5d68689c98",
      "k8s-pod/control-plane": "controller-manager"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:56:00.869255595Z"
  },
  {
    "textPayload": "2023-09-19T09:55:59.354Z\tINFO\tcontrollers.DopplerSecret\tRequeue duration set\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\", \"requeueAfter\": \"1m0s\"}",
    "insertId": "m7y9yp1pi5gtdma4",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "location": "europe-west2",
        "namespace_name": "doppler-operator-system",
        "cluster_name": "REDACTED-gke",
        "container_name": "manager",
        "project_id": "REDACTED",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f"
      }
    },
    "timestamp": "2023-09-19T09:55:59.354475856Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/control-plane": "controller-manager",
      "k8s-pod/pod-template-hash": "5d68689c98",
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:56:00.869255595Z"
  },
  {
    "textPayload": "2023-09-19T09:55:59.354Z\tINFO\tcontrollers.DopplerSecret\tFetching Doppler secrets\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\", \"verifyTLS\": true, \"host\": \"https://api.doppler.com\"}",
    "insertId": "947ig6ozhi5glzw8",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "location": "europe-west2",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f",
        "container_name": "manager",
        "project_id": "REDACTED",
        "cluster_name": "REDACTED-gke",
        "namespace_name": "doppler-operator-system"
      }
    },
    "timestamp": "2023-09-19T09:55:59.354499961Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/pod-template-hash": "5d68689c98",
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb",
      "k8s-pod/control-plane": "controller-manager"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:56:00.869255595Z"
  },
  {
    "textPayload": "2023-09-19T09:55:59.661Z\tINFO\tcontrollers.DopplerSecret\t[-] Doppler secrets not modified.\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\", \"verifyTLS\": true, \"host\": \"https://api.doppler.com\"}",
    "insertId": "v5t3k5y9l44g7y7w",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "container_name": "manager",
        "namespace_name": "doppler-operator-system",
        "location": "europe-west2",
        "project_id": "REDACTED",
        "cluster_name": "REDACTED-gke",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f"
      }
    },
    "timestamp": "2023-09-19T09:55:59.662147784Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb",
      "k8s-pod/pod-template-hash": "5d68689c98",
      "k8s-pod/control-plane": "controller-manager"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:56:00.869255595Z"
  },
  {
    "textPayload": "2023-09-19T09:55:59.675Z\tINFO\tcontrollers.DopplerSecret\t[-] Deployment is already running latest version, nothing to do\t{\"deployment\": \"default/master-web\"}",
    "insertId": "qbkp8kk6rz4ajvi5",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "container_name": "manager",
        "project_id": "REDACTED",
        "cluster_name": "REDACTED-gke",
        "location": "europe-west2",
        "namespace_name": "doppler-operator-system",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f"
      }
    },
    "timestamp": "2023-09-19T09:55:59.675961038Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/control-plane": "controller-manager",
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb",
      "k8s-pod/pod-template-hash": "5d68689c98"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:56:00.869255595Z"
  },
  {
    "textPayload": "2023-09-19T09:55:59.675Z\tINFO\tcontrollers.DopplerSecret\tFinished reconciling deployments\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\", \"numDeployments\": 68}",
    "insertId": "df8vqad7g7vd7iq3",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "cluster_name": "REDACTED-gke",
        "namespace_name": "doppler-operator-system",
        "location": "europe-west2",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f",
        "container_name": "manager",
        "project_id": "REDACTED"
      }
    },
    "timestamp": "2023-09-19T09:55:59.676017166Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb",
      "k8s-pod/control-plane": "controller-manager",
      "k8s-pod/pod-template-hash": "5d68689c98"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:56:00.869255595Z"
  },
  {
    "textPayload": "2023-09-19T09:55:59.683Z\tINFO\tcontrollers.DopplerSecret\tFinished reconciliation\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\"}",
    "insertId": "rb8v13p9vw8zfxoo",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "cluster_name": "REDACTED-gke",
        "location": "europe-west2",
        "namespace_name": "doppler-operator-system",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f",
        "container_name": "manager",
        "project_id": "REDACTED"
      }
    },
    "timestamp": "2023-09-19T09:55:59.684133322Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb",
      "k8s-pod/control-plane": "controller-manager",
      "k8s-pod/pod-template-hash": "5d68689c98"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:56:00.869255595Z"
  },
  {
    "textPayload": "2023-09-19T09:56:59.684Z\tINFO\tcontrollers.DopplerSecret\tReconciling dopplersecret\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\"}",
    "insertId": "j9r6hjprjte7ixkr",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "cluster_name": "REDACTED-gke",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f",
        "namespace_name": "doppler-operator-system",
        "container_name": "manager",
        "location": "europe-west2",
        "project_id": "REDACTED"
      }
    },
    "timestamp": "2023-09-19T09:56:59.684760584Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/control-plane": "controller-manager",
      "k8s-pod/pod-template-hash": "5d68689c98",
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:57:00.764897121Z"
  },
  {
    "textPayload": "2023-09-19T09:56:59.684Z\tINFO\tcontrollers.DopplerSecret\tRequeue duration set\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\", \"requeueAfter\": \"1m0s\"}",
    "insertId": "2uzlsgg8w54nhute",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "project_id": "REDACTED",
        "namespace_name": "doppler-operator-system",
        "location": "europe-west2",
        "cluster_name": "REDACTED-gke",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f",
        "container_name": "manager"
      }
    },
    "timestamp": "2023-09-19T09:56:59.684819207Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/pod-template-hash": "5d68689c98",
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb",
      "k8s-pod/control-plane": "controller-manager"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:57:00.764897121Z"
  },
  {
    "textPayload": "2023-09-19T09:56:59.684Z\tINFO\tcontrollers.DopplerSecret\tFetching Doppler secrets\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\", \"verifyTLS\": true, \"host\": \"https://api.doppler.com\"}",
    "insertId": "t10h8l751wv8hu65",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "namespace_name": "doppler-operator-system",
        "cluster_name": "REDACTED-gke",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f",
        "container_name": "manager",
        "project_id": "REDACTED",
        "location": "europe-west2"
      }
    },
    "timestamp": "2023-09-19T09:56:59.684864304Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb",
      "k8s-pod/pod-template-hash": "5d68689c98",
      "k8s-pod/control-plane": "controller-manager"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:57:00.764897121Z"
  },
  {
    "textPayload": "2023-09-19T09:56:59.916Z\tINFO\tcontrollers.DopplerSecret\t[-] Doppler secrets not modified.\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\", \"verifyTLS\": true, \"host\": \"https://api.doppler.com\"}",
    "insertId": "22uotb74k4so9ifj",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "namespace_name": "doppler-operator-system",
        "container_name": "manager",
        "cluster_name": "REDACTED-gke",
        "location": "europe-west2",
        "project_id": "REDACTED",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f"
      }
    },
    "timestamp": "2023-09-19T09:56:59.916423672Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb",
      "k8s-pod/pod-template-hash": "5d68689c98",
      "k8s-pod/control-plane": "controller-manager"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:57:00.764897121Z"
  },
  {
    "textPayload": "2023-09-19T09:56:59.951Z\tINFO\tcontrollers.DopplerSecret\t[/] Updated deployment\t{\"deployment\": \"default/master-web\"}",
    "insertId": "p7yrfvdo4cxx4ijv",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "location": "europe-west2",
        "namespace_name": "doppler-operator-system",
        "cluster_name": "REDACTED-gke",
        "project_id": "REDACTED",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f",
        "container_name": "manager"
      }
    },
    "timestamp": "2023-09-19T09:56:59.952055849Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/pod-template-hash": "5d68689c98",
      "k8s-pod/control-plane": "controller-manager",
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:57:00.764897121Z"
  },
  {
    "textPayload": "2023-09-19T09:56:59.951Z\tINFO\tcontrollers.DopplerSecret\tFinished reconciling deployments\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\", \"numDeployments\": 68}",
    "insertId": "3aqfqvefbv739e89",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "location": "europe-west2",
        "project_id": "REDACTED",
        "container_name": "manager",
        "cluster_name": "REDACTED-gke",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f",
        "namespace_name": "doppler-operator-system"
      }
    },
    "timestamp": "2023-09-19T09:56:59.952099413Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/pod-template-hash": "5d68689c98",
      "k8s-pod/control-plane": "controller-manager",
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:57:00.764897121Z"
  },
  {
    "textPayload": "2023-09-19T09:56:59.961Z\tINFO\tcontrollers.DopplerSecret\tFinished reconciliation\t{\"dopplersecret\": \"doppler-operator-system/master-web-prod\"}",
    "insertId": "9frjo86idw4i3ff9",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "project_id": "REDACTED",
        "location": "europe-west2",
        "namespace_name": "doppler-operator-system",
        "container_name": "manager",
        "cluster_name": "REDACTED-gke",
        "pod_name": "doppler-operator-controller-manager-5d68689c98-qrb8f"
      }
    },
    "timestamp": "2023-09-19T09:56:59.961736109Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "gk3-REDACTED-gke-nap-1t7fymkd-64661875-r2wb",
      "k8s-pod/pod-template-hash": "5d68689c98",
      "k8s-pod/control-plane": "controller-manager"
    },
    "logName": "projects/REDACTED/logs/stderr",
    "receiveTimestamp": "2023-09-19T09:57:00.764897121Z"
  }
]
watsonian commented 11 months ago

@SGudbrandsson You don't happen to be using something like Reloader as well, do you? That could account for a double deploy (i.e., our operator does it as well as another operator like Reloader).

SGudbrandsson commented 11 months ago

@watsonian no, we don't use anything like that. We only use Github actions to deploy helm templates and Doppler for secret management.

nmanoogian commented 11 months ago

@SGudbrandsson The operator achieves the redeployment by setting a special annotation on the deployment resource - specifically, secrets.doppler.com/secretsupdate.<secret_name>. When you perform your Helm deployment, does it clear this annotation?

SGudbrandsson commented 11 months ago

@nmanoogian yes, it seems that's the case. image

What would be the best course of action to get this fixed?

nmanoogian commented 11 months ago

Good question! I'm not sure if I've run into this before. How exactly are you doing your deploys/releases with Helm? I'm wondering if there's a Helm option to leave certain metadata (like annotations) alone during that process

SGudbrandsson commented 11 months ago

Good point about the helm install bit. I did a little research and found this issue in Helm https://github.com/helm/helm/issues/11823 It mentions that upgrading workloads with --force will perform a replacement strategy. For some reason, we have this option in all our deployments which might explain this behavior.

I'm going to remove it and test the results. I'll post the results in a bit

SGudbrandsson commented 11 months ago

Okay, confirmed.

Removing --force in the helm upgade command fixed the deployments. Now they are deployed correctly and only once during deployment. The annotation remains during helm upgrades.

Thanks for your help @nmanoogian and @watsonian !

nmanoogian commented 11 months ago

Outstanding, @SGudbrandsson! Thanks for reaching out about this; it's definitely good to know about helm's --force flag!