GoogleCloudPlatform / k8s-config-connector

GCP Config Connector, a Kubernetes add-on for managing GCP resources
https://cloud.google.com/config-connector/docs/overview
Apache License 2.0
890 stars 218 forks source link

ContainerCluster updates infinitely every 10 minutes #264

Closed jonnylangefeld closed 2 years ago

jonnylangefeld commented 4 years ago

Describe the bug As the title says. I believe it has to do with the field masterAuthorizedNetworksConfig. I haven't seen this happen on clusters that didn't have that parameter set. I also see log entries on Stackdriver every 10 minutes that allude to masterAuthorizedNetworksConfig being updated.

ConfigConnector Version 1.11.1

To Reproduce Create the containercluster below

YAML snippets:

apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerCluster
metadata:
  annotations:
    cnrm.cloud.google.com/project-id: cruise-paas-platform-dev-ac6a
  name: mango
  namespace: clusters
spec:
  addonsConfig:
    networkPolicyConfig:
      disabled: false
  clusterAutoscaling:
    autoscalingProfile: BALANCED
    enabled: false
  clusterIpv4Cidr: 10.27.0.0/17
  defaultMaxPodsPerNode: 110
  enableIntranodeVisibility: true
  initialNodeCount: 1
  ipAllocationPolicy:
    clusterIpv4CidrBlock: 10.27.0.0/17
    clusterSecondaryRangeName: container-cidr
    servicesIpv4CidrBlock: 172.16.104.0/22
    servicesSecondaryRangeName: service-cidr
  location: us-central1
  loggingService: logging.googleapis.com/kubernetes
  masterAuth:
    clientCertificateConfig:
      issueClientCertificate: true
  masterAuthorizedNetworksConfig:
    cidrBlocks:
    - cidrBlock: 10.10.10.10/32
      displayName: aws-corp
    - cidrBlock: 10.10.10.11/30
      displayName: sjc-1
    - cidrBlock: 10.10.10.12/28
      displayName: sfo3
    - cidrBlock: 10.10.10.13/32
      displayName: anyconnect-vpn
  minMasterVersion: 1.15.12-gke.9
  monitoringService: monitoring.googleapis.com/kubernetes
  networkPolicy:
    enabled: true
  networkRef:
    external: projects/cruise-neteng-dev-f9fe/global/networks/cruise-neteng-dev-f9fe
  nodeConfig:
    diskSizeGb: 100
    diskType: pd-standard
    imageType: COS
    machineType: n1-standard-1
    metadata:
      disable-legacy-endpoints: "true"
    oauthScopes:
    - https://www.googleapis.com/auth/logging.write
    - https://www.googleapis.com/auth/monitoring
    serviceAccountRef:
      name: mango-default
    shieldedInstanceConfig:
      enableIntegrityMonitoring: true
  nodeLocations:
  - us-central1-a
  - us-central1-c
  - us-central1-f
  nodeVersion: 1.15.12-gke.9
  privateClusterConfig:
    enablePrivateEndpoint: false
    enablePrivateNodes: true
    masterIpv4CidrBlock: 10.10.0.10/28
  subnetworkRef:
    name: mango
jonnylangefeld commented 4 years ago

If I analyze stackdriver with a query like

protoPayload.serviceName="container.googleapis.com"
protoPayload.methodName="google.container.v1beta1.ClusterManager.UpdateCluster"
protoPayload.authenticationInfo.principalEmail="<cnrm service account>"

I see constant updates in "desiredMasterAuthorizedNetworksConfig": {}. If I compare the diffs of those updates, I see that the only diffs are the order of cidrBlock and displayName (probably because it's a map?). A typical diff would look like this:

<         "cidrBlock": "35.237.67.185/32",
<         "displayName": "gcp-staging-cloud-nat-us-west2-4"
---
>         "displayName": "gcp-staging-cloud-nat-us-west2-4",
>         "cidrBlock": "35.237.67.185/32"

I couldn't quite figure out yet if the IP rotation of the api server is related to it.

spew commented 4 years ago

Hi @jonnylangefeld thanks for reporting this along with the detailed information and reproduction steps. We will look into fixing this issue.

kibbles-n-bytes commented 4 years ago

Hey Jonny, the infinite diff issue is now fixed in 1.23.0. The issue was due to the diff calculation on spec.minMasterVersion detecting differences too aggressively. We are working with our sister team to solve the master authorized network config ordering issue, which was exacerbated by the min master version triggering these constant updates.

jonnylangefeld commented 3 years ago

Hi @kibbles-n-bytes, any updates to the master authorized network config ordering issue?

caieo commented 3 years ago

Hi @jonnylangefeld, this issue should no longer be happening -- we checked in with our sister team and confirmed that the diff calculation should not be detecting the master authorized network config field reordering as a diff. Is this issue still affecting you?