resources suddenly in state updatefailed because of "field not declared in schema"

RolandOtta commented 2 years ago

Bug Description

we suddenly (since last night) have some resource in state UpdateFailed, even if we did not update them. the events show the error "field not declared in schema" for various fields.

our k8s cluster is a managed config controller as described in https://cloud.google.com/anthos-config-management/docs/tutorials/landing-zone#preparing_the_environment

for example: in our computesubnetwork.compute.cnrm.cloud.google.com

Events:
  Type     Reason        Age                   From                          Message
  ----     ------        ----                  ----                          -------
  Warning  UpdateFailed  18m (x267 over 9h)    computesubnetwork-controller  Update call failed: error expanding resource configuration: error resolving externally-managed fields: error converting state to typed value: .stackType: field not declared in schema
  Warning  UpdateFailed  3m42s (x25 over 46m)  computesubnetwork-controller  Update call failed: error expanding resource configuration: error resolving externally-managed fields: error converting spec to typed value: .stackType: field not declared in schema

in all of our containercluster.container.cnrm.cloud.google.com

Events:
  Type     Reason        Age                From                         Message
  ----     ------        ----               ----                         -------
  Warning  UpdateFailed  16m (x33 over 9h)  containercluster-controller  Update call failed: error expanding resource configuration: error resolving externally-managed fields: error converting state to typed value: errors:
  .nodeConfig.workloadMetadataConfig.mode: field not declared in schema
  .monitoringConfig: field not declared in schema
  Warning  UpdateFailed  10m (x118 over 9h)  containercluster-controller  Update call failed: error expanding resource configuration: error resolving externally-managed fields: error converting state to typed value: .loggingConfig: field not declared in schema
  Warning  UpdateFailed  28s (x96 over 9h)   containercluster-controller  Update call failed: error expanding resource configuration: error resolving externally-managed fields: error converting state to typed value: .monitoringConfig: field not declared in schema

Additional Diagnostic Information

Kubernetes Cluster Version

Client Version: v1.19.3 Server Version: v1.20.10-gke.1600

Config Connector Version

1.65.0

Config Connector Mode

no output

maqiuyujoyce commented 2 years ago

Hi @RolandOtta , could you do a sanity check and confirm that the fields you mentioned above are defined in the CRDs installed in your cluster? You can run the following commands to verify:

kubectl describe crd computesubnetworks.compute.cnrm.cloud.google.com | grep "Stack Type:"
kubectl describe crd containerclusters.container.cnrm.cloud.google.com | grep "Logging Config:"
kubectl describe crd containerclusters.container.cnrm.cloud.google.com | grep "Monitoring Config:"

In addition, could you share the yamls you used to create the resources so that I can try to reproduce the issue?

RolandOtta commented 2 years ago

hi @maqiuyujoyce

the resource magically changed the state back to uptodate 14 hours ago without any intervention from our side

crd´s are fine and also have not changed for days

kubectl describe crd containerclusters.container.cnrm.cloud.google.com | grep -i "transition time"
      Description:  The last transition time for the value in 'Status'
                    Last Transition Time:
    Last Transition Time:  2021-10-21T11:29:01Z
    Last Transition Time:  2021-10-21T11:29:01Z

quite mysterious ....

RolandOtta commented 2 years ago

it seems that our config controller cluster has been completely restarted .... i have no idea what has been changed there at the same time

cnrm-system                       cnrm-controller-manager-c5oktskgkgt2glbff2t0-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5ola6sgkgt2glbff2tg-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5ola6sgkgt2glbff2u0-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5olaakgkgt2glbff2ug-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5olaakgkgt2glbff2v0-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5olaakgkgt2glbff2vg-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5olaakgkgt2glbff300-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5olaakgkgt2glbff30g-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5olaakgkgt2glbff310-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5olafkgkgt2glbff31g-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5olafkgkgt2glbff320-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5olafkgkgt2glbff32g-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5olafkgkgt2glbff330-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5p8nhkgkgt2glbff33g-0                   2/2     Running   0          14h
cnrm-system                       cnrm-controller-manager-c5t9bgsgkgt2glbff34g-0                   2/2     Running   0          14h
cnrm-system                       cnrm-deletiondefender-0                                          1/1     Running   0          14h
cnrm-system                       cnrm-resource-stats-recorder-9dd64c574-jhwxl                     2/2     Running   0          14h
cnrm-system                       cnrm-webhook-manager-5df6dbd77f-n62rk                            1/1     Running   0          14h
cnrm-system                       cnrm-webhook-manager-5df6dbd77f-t2xs6                            1/1     Running   0          14h
config-management-system          git-importer-594ddf4b4-th7xn                                     3/3     Running   0          14h
config-management-system          monitor-77c6b8b869-qkg69                                         1/1     Running   0          14h
configconnector-operator-system   configconnector-operator-0                                       1/1     Running   0          14h
gatekeeper-system                 gatekeeper-audit-65b49f875f-gr4z5                                1/1     Running   0          14h
gatekeeper-system                 gatekeeper-controller-manager-6dc946bf9b-pzq6p                   1/1     Running   0          14h
krmapihosting-monitoring          krmapihosting-metrics-agent-45tl2                                1/1     Running   0          14h
krmapihosting-monitoring          krmapihosting-metrics-agent-92qpp                                1/1     Running   0          14h
krmapihosting-monitoring          krmapihosting-metrics-agent-cqgsz                                1/1     Running   0          14h
krmapihosting-system              bootstrap-7d9458ccb7-sv7wg                                       1/1     Running   0          14h
kube-system                       config-management-operator-65bd7f9599-x5blp                      1/1     Running   0          14h
kube-system                       event-exporter-gke-67986489c8-rtp95                              2/2     Running   0          14h
kube-system                       fluentbit-gke-2xm8b                                              2/2     Running   0          14h
kube-system                       fluentbit-gke-6j465                                              2/2     Running   0          14h
kube-system                       fluentbit-gke-qvw2r                                              2/2     Running   0          14h
kube-system                       gke-metadata-server-5s56l                                        1/1     Running   0          14h
kube-system                       gke-metadata-server-7tc45                                        1/1     Running   0          14h
kube-system                       gke-metadata-server-zq2l4                                        1/1     Running   0          14h
kube-system                       gke-metrics-agent-bdml4                                          1/1     Running   0          14h
kube-system                       gke-metrics-agent-ld6pz                                          1/1     Running   0          14h
kube-system                       gke-metrics-agent-sptlx                                          1/1     Running   0          14h
kube-system                       kube-dns-autoscaler-844c9d9448-vkgbx                             1/1     Running   0          14h
kube-system                       kube-dns-b4f5c58c7-hsczv                                         4/4     Running   0          14h
kube-system                       kube-dns-b4f5c58c7-lc4vc                                         4/4     Running   0          14h
kube-system                       kube-proxy-gke-krmapihost-confi-krmapihost-confi-25f5886d-xv9c   1/1     Running   0          14h
kube-system                       kube-proxy-gke-krmapihost-confi-krmapihost-confi-2c989cd1-6qdf   1/1     Running   0          14h
kube-system                       kube-proxy-gke-krmapihost-confi-krmapihost-confi-6323a232-gnfv   1/1     Running   0          14h
kube-system                       l7-default-backend-56cb9644f6-6qbxp                              1/1     Running   0          14h
kube-system                       metrics-server-v0.3.6-9c5bbf784-q9ppz                            2/2     Running   0          14h
kube-system                       netd-jnwp4                                                       1/1     Running   0          14h
kube-system                       netd-xk48x                                                       1/1     Running   0          14h
kube-system                       netd-z7gbc                                                       1/1     Running   0          14h
kube-system                       pdcsi-node-226h8                                                 2/2     Running   0          14h
kube-system                       pdcsi-node-bp2fc                                                 2/2     Running   0          14h
kube-system                       pdcsi-node-lrmn7                                                 2/2     Running   0          14h
resource-group-system             resource-group-controller-manager-5449bc55f4-ks9pd               2/2     Running   0          14h

maqiuyujoyce commented 2 years ago

Hi @RolandOtta , glad to hear that your resources are back to normal!

As to the restart of your config controller cluster, with the limited information, it's hard for us to determine if it was the root cause or not. For now, if you have no further questions, I'll close this issue.

If you are concerned about the behavior of the config controller cluster, you can also reach out to Config Controller experts via GCP support.

GoogleCloudPlatform / k8s-config-connector