cockroachdb / cockroach-operator

k8s operator for CRDB
Apache License 2.0
282 stars 94 forks source link

Issues in e2e log #536

Open chrislovecnm opened 3 years ago

chrislovecnm commented 3 years ago

We are having two problems during e2e.

  1. Jobs looping
  2. Decommission is starting after an upgrade

Log file is attached showing those two problems

log-example.txt

keith-mcclellan commented 3 years ago

my diagnosis is that after we upgrade the first partition, we're trying to update an old version of the ss definition rather than pulling the latest to update the next pod, and it's breaking. @chrisseto suggested it might just be a stale resourceVersion.

So the error breaks us out of the actor, and requeues, but since Upgrade is no longer the first actor, this is conflicting with Decommission and VersionChecker. I bet this is how partitionedUpgrade ended up as the first actor to begin with, we've probably had this bug for a while now.

Can we force a reload of the object every time before we update a partition @chrislovecnm @alinadonisa ? If its just the resourceVerson that should fix it. cc: @udnay

2021-05-27T20:12:42.1880356Z logger.go:130: 2021-05-27T20:12:42.105Z ERROR crdb-test-zqwfhx Error getting statefulset Operation cannot be fulfilled on statefulsets.apps "crdb": the object has been modified; please apply your changes to the latest version and try again {"action": "partitionedUpdate", "CrdbCluster": "crdb-test-zqwfhx/crdb", "stsName": "crdb", "namespace": "crdb-test-zqwfhx", "error": "Operation cannot be fulfilled on statefulsets.apps \"crdb\": the object has been modified; please apply your changes to the latest version and try again"} 2021-05-27T20:12:42.1887375Z logger.go:130: 2021-05-27T20:12:42.108Z INFO Error on action {"CrdbCluster": "crdb-test-zqwfhx/crdb", "Action": "PartialUpdate", "err": "failed to update sts with partitioned update: crdb: error applying updateStrategyFunc to crdb crdb-test-zqwfhx: Operation cannot be fulfilled on statefulsets.apps \"crdb\": the object has been modified; please apply your changes to the latest version and try again"} 2021-05-27T20:12:42.1910112Z logger.go:130: 2021-05-27T20:12:42.108Z ERROR action failed {"CrdbCluster": "crdb-test-zqwfhx/crdb", "error": "failed to update sts with partitioned update: crdb: error applying updateStrategyFunc to crdb crdb-test-zqwfhx: Operation cannot be fulfilled on statefulsets.apps \"crdb\": the object has been modified; please apply your changes to the latest version and try again", "errorVerbose": "Operation cannot be fulfilled on statefulsets.apps \"crdb\": the object has been modified; please apply your changes to the latest version and try again\nerror applying updateStrategyFunc to crdb crdb-test-zqwfhx\ngithub.com/cockroachdb/cockroach-operator/pkg/update.UpdateClusterRegionStatefulSet\n\tpkg/update/update.go:151\ngithub.com/cockroachdb/cockroach-operator/pkg/update.updateClusterStatefulSets\n\tpkg/update/update_cockroach_version.go:132\ngithub.com/cockroachdb/cockroach-operator/pkg/update.UpdateClusterCockroachVersion\n\tpkg/update/update_cockroach_version.go:117\ngithub.com/cockroachdb/cockroach-operator/pkg/actor.(*partitionedUpdate).Act\n\tpkg/actor/partitioned_update.go:224\ngithub.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n\tpkg/controller/cluster_controller.go:130\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99\nruntime.goexit\n\tGOROOT/src/runtime/asm_amd64.s:1371\nfailed to update sts with partitioned update: crdb\ngithub.com/cockroachdb/cockroach-operator/pkg/actor.(*partitionedUpdate).Act\n\tpkg/actor/partitioned_update.go:237\ngithub.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n\tpkg/controller/cluster_controller.go:130\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99\nruntime.goexit\n\tGOROOT/src/runtime/asm_amd64.s:1371"}