GoogleContainerTools / skaffold

Easy and Repeatable Kubernetes Development
https://skaffold.dev/
Apache License 2.0
15.03k stars 1.62k forks source link

ConfigConnector CRD status checking #7207

Open jsok opened 2 years ago

jsok commented 2 years ago

Expected behavior

skaffold apply of a manifest which contains a ConfigConnector CRD should succeed once the resource becomes healthy.

i.e. https://cloud.google.com/config-connector/docs/how-to/install-upgrade-uninstall#addon-configuring

Actual behavior

BUILD
Pulling image: gcr.io/k8s-skaffold/skaffold:v1.35.2
v1.35.2: Pulling from k8s-skaffold/skaffold
Digest: sha256:dd7b38c20839d081a2e17f3c99b7639c0e201bca8e2804ca66bdd462df714d3d
Status: Downloaded newer image for gcr.io/k8s-skaffold/skaffold:v1.35.2
gcr.io/k8s-skaffold/skaffold:v1.35.2
<SNIP>
Operation completed over 3 objects/3.4 KiB.                                      
Fetching cluster endpoint and auth data.
kubeconfig entry generated for CLUSTER
Starting deploy...
 - configconnector.core.cnrm.cloud.google.com/configconnector.core.cnrm.cloud.google.com configured
Waiting for deployments to stabilize...
 - :config-connector-resource/core.cnrm.cloud.google.com/v1beta1, Kind=ConfigConnector, Name=configconnector.core.cnrm.cloud.google.com: could not stabilize within 10m0s
 - :config-connector-resource/core.cnrm.cloud.google.com/v1beta1, Kind=ConfigConnector, Name=configconnector.core.cnrm.cloud.google.com failed. Error: could not stabilize within 10m0s.
1/1 deployment(s) failed
ERROR
ERROR: build step 0 "gcr.io/k8s-skaffold/skaffold:v1.35.2" failed: step exited with non-zero status: 1

Checking the resource show's it's Healthy and up to date:

kubectl describe configconnector configconnector.core.cnrm.cloud.google.com
Name:         configconnector.core.cnrm.cloud.google.com
Namespace:
Labels:       app.kubernetes.io/managed-by=google-cloud-deploy
              deploy.cloud.google.com/delivery-pipeline-id=config-connector
              deploy.cloud.google.com/location=XXX
              deploy.cloud.google.com/project-id=XXX
              deploy.cloud.google.com/release-id=XXX
              deploy.cloud.google.com/target-id=XXX
              skaffold.dev/run-id=XXX
Annotations:  <none>
API Version:  core.cnrm.cloud.google.com/v1beta1
Kind:         ConfigConnector
Metadata:
  Creation Timestamp:  2021-09-24T05:13:04Z
  Finalizers:
    configconnector.cnrm.cloud.google.com/finalizer
  Generation:  4
  Managed Fields:
    API Version:  core.cnrm.cloud.google.com/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
    Manager:      gke_addon_poststart
    Operation:    Update
    Time:         2021-09-24T05:13:04Z
    API Version:  core.cnrm.cloud.google.com/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"configconnector.cnrm.cloud.google.com/finalizer":
      f:status:
        .:
        f:healthy:
    Manager:      manager
    Operation:    Update
    Time:         2021-09-24T05:13:06Z
    API Version:  core.cnrm.cloud.google.com/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
        f:labels:
          .:
          f:app.kubernetes.io/managed-by:
          f:deploy.cloud.google.com/delivery-pipeline-id:
          f:deploy.cloud.google.com/location:
          f:deploy.cloud.google.com/project-id:
          f:deploy.cloud.google.com/release-id:
          f:deploy.cloud.google.com/target-id:
          f:skaffold.dev/run-id:
      f:spec:
        f:googleServiceAccount:
        f:mode:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2022-03-21T03:38:20Z
  Resource Version:  294911712
  UID:               cc821634-355a-4de5-8b21-6f7bac53ccba
Spec:
  Google Service Account:  XXX@XXX.iam.gserviceaccount.com
  Mode:                    cluster
Status:
  Healthy:  true
Events:
  Type    Reason    Age                      From                        Message
  ----    ------    ----                     ----                        -------
  Normal  UpToDate  4m49s (x24532 over 16d)  configconnector-controller  ConfigConnector is up to date

Information

apiVersion: skaffold/v2beta16
kind: Config
metadata:
  name: config-connector
profiles:
  - name: dev
    deploy:
      kubectl:
        manifests:
          - configconnector.yaml

Steps to reproduce the behavior

create a configconnector.yaml:

# configconnector.yaml
apiVersion: core.cnrm.cloud.google.com/v1beta1
kind: ConfigConnector
metadata:
  # the name is restricted to ensure that there is only one
  # ConfigConnector resource installed in your cluster
  name: configconnector.core.cnrm.cloud.google.com
spec:
 mode: cluster
 googleServiceAccount: "SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com"
jsok commented 2 years ago

It seems like #6766 didn't account for the top-level CRD to configure Config Connector itself. It doesn't look like this resource has a Ready status either which is probably why it's not able to status check.

I'm also unable to disable the status check functionality, the modified skaffold.yaml looks like:

apiVersion: skaffold/v2beta16
kind: Config
metadata:
  name: config-connector
deploy:
  statusCheck: false
profiles:
  - name: dev
    deploy:
      kubectl:
        manifests:
          - configconnector.yaml

Edit: looks like I'm a victim of #7089 , Cloud Deploy's latest version is 1.35.2

MarlonGamez commented 2 years ago

@jsok thanks for opening up this issue. I'll bring this up with the team so that we can have a fix out asap

aaron-prindle commented 2 years ago

Seems there might not be an exposed Go struct for ConfigConnector currently - https://github.com/GoogleCloudPlatform/k8s-config-connector/issues/472. Will have to do our own unmarshalling & examination for the status

aaron-prindle commented 2 years ago

In investigating this issue, there seems to be two issues with deploying this ConfigConnector issue in skaffold currently:

  1. Because the ConfigConnector resource is not namespaced (it is a global resource), it does not come up in our resource selection for status checking (the below error occurs in the code): https://github.com/GoogleContainerTools/skaffold/blob/main/pkg/diag/validator/custom_resource.go#L50 ^^ this is what is occuring currently, the status check code is not actually called (healthy vs normal k8s status)

  2. As mentioned in this issue assuming the above was fixed we also need to special case the health checking logic for ConfigConnector to look for .status.healthy == true Also there are a few other Config Connector resources that use this status pattern that would need to be accounted for, for example ConfigConnectorContext.

Initially this was prioritized for v1.37.1 but after further inspection and more understanding we are no longer including it in that fix as this is not a regression. We will need to understand more about how we want to handle global resources generally or possibly see if it makes sense to special case ConfigConnector (and possibly ConfigConnectorContext)