GoogleContainerTools / kpt-config-sync

Config Sync - used to sync Git, OCI and Helm charts to your clusters.
Apache License 2.0
229 stars 40 forks source link

[WIP] Fix remediator errors causing sync failure #1253

Closed karlkfi closed 2 days ago

karlkfi commented 3 weeks ago

Dependencies:

google-oss-prow[bot] commented 3 weeks ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from karlkfi. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/GoogleContainerTools/kpt-config-sync/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
karlkfi commented 3 weeks ago

/hold

Needs rebase on https://github.com/GoogleContainerTools/kpt-config-sync/pull/1257

karlkfi commented 3 weeks ago

/unhold

karlkfi commented 3 weeks ago

/retest

TestMultiSyncs_Unstructured_MixedControl failed. Not sure if it's a flake or not.

Weird error:

        E0613 17:05:34.263560       1 reconciler_base.go:648] "Removal of reconciler-manager finalizer failed" err=<
            KNV2002: failed to update RootSync to remove the reconciler-manager finalizer: APIServer error: Operation cannot be fulfilled on rootsyncs.configsync.gke.io "root-sync": StorageError: invalid object, Code: 4, Key: /registry/configsync.gke.io/rootsyncs/config-management-system/root-sync, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 134dd304-9f16-4b9f-842b-e4f401d9ca11, UID in object meta: 

            For more information, see https://g.co/cloud/acm-errors#knv2002
         > logger="controllers.RootSync" syncKind="RootSync" sync="config-management-system/root-sync" reconciler="config-management-system/root-reconciler"
        E0613 17:05:34.263609       1 controller.go:329] "Reconciler error" err=<
            KNV2002: failed to update RootSync to remove the reconciler-manager finalizer: APIServer error: Operation cannot be fulfilled on rootsyncs.configsync.gke.io "root-sync": StorageError: invalid object, Code: 4, Key: /registry/configsync.gke.io/rootsyncs/config-management-system/root-sync, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 134dd304-9f16-4b9f-842b-e4f401d9ca11, UID in object meta: 

            For more information, see https://g.co/cloud/acm-errors#knv2002
         > controller="rootsync" controllerGroup="configsync.gke.io" controllerKind="RootSync" RootSync="config-management-system/root-sync" namespace="config-management-system" name="root-sync" reconcileID="980fa133-a380-48c1-a1d2-b1112591a182"

It seems like that kind of update error should never happen. The reconciler-manager is the only thing that should be removing its finalizer, and that blocks deletion. But it's possible it's a red herring caused by an aggressive test cleanup step that removes finalizers.

The other error could be anything:

testlogger.go:77: 2024-06-13 17:13:53.615060134 +0000 UTC WatchObject(RepoSync test-ns/nr1) watched for 6m0.000159361s
    new.go:501: 2024-06-13 17:13:53.615118764 +0000 UTC ERROR: waiting for sync: WatchObject(RepoSync test-ns/nr1): predicates not satisfied: object not found; object not found; object not found; object not found: context deadline exceeded
karlkfi commented 3 weeks ago

This PR still seemed too big, so I extracted the dedupe logic to https://github.com/GoogleContainerTools/kpt-config-sync/pull/1274

karlkfi commented 2 days ago

This is obsolete. Replaced by https://github.com/GoogleContainerTools/kpt-config-sync/pull/1276 & https://github.com/GoogleContainerTools/kpt-config-sync/pull/1290 and https://github.com/GoogleContainerTools/kpt-config-sync/pull/1296