apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.2k stars 184 forks source link

chore: fix a flaky test #8440

Closed cjc7373 closed 4 days ago

cjc7373 commented 2 weeks ago

Today I've encountered a very weird test fail (also this run).

log

```bash • [PANICKED] [0.874 seconds] Cluster Controller [AfterEach] cluster status test cluster conditions when cluster definition non-exist [AfterEach] /home/runner/work/kubeblocks/kubeblocks/controllers/apps/cluster_controller_test.go:123 [It] /home/runner/work/kubeblocks/kubeblocks/controllers/apps/cluster_controller_test.go:929 Timeline >> STEP: clean resources @ 11/11/24 06:36:09.526 STEP: clear resources Cluster @ 11/11/24 06:36:09.526 STEP: clear resources ClusterDefinition @ 11/11/24 06:36:09.529 STEP: clear resources ComponentDefinition @ 11/11/24 06:36:09.532 STEP: clear resources ComponentVersion @ 11/11/24 06:36:09.535 STEP: clear resources Component @ 11/11/24 06:36:09.54 STEP: clear resources PersistentVolumeClaim @ 11/11/24 06:36:09.616 STEP: clear resources Pod @ 11/11/24 06:36:09.626 STEP: clear resources Backup @ 11/11/24 06:36:09.629 STEP: clear resources BackupPolicy @ 11/11/24 06:36:09.647 STEP: clear resources VolumeSnapshot @ 11/11/24 06:36:09.7 STEP: clear resources Service @ 11/11/24 06:36:09.705 STEP: clear resources BackupPolicyTemplate @ 11/11/24 06:36:09.712 STEP: clear resources ActionSet @ 11/11/24 06:36:09.716 STEP: clear resources StorageClass @ 11/11/24 06:36:09.722 STEP: Create a componentDefinition obj @ 11/11/24 06:36:09.735 STEP: Create a componentVersion obj @ 11/11/24 06:36:09.768 STEP: Create a clusterDefinition obj @ 11/11/24 06:36:09.799 STEP: Create a bpt obj @ 11/11/24 06:36:09.846 STEP: create actionSet @ 11/11/24 06:36:09.846 STEP: Creating a BackupPolicyTemplate @ 11/11/24 06:36:09.9 STEP: Wait objects available @ 11/11/24 06:36:09.912 STEP: create a cluster with cluster definition non-exist @ 11/11/24 06:36:09.945 STEP: check conditions @ 11/11/24 06:36:09.958 STEP: clean resources @ 11/11/24 06:36:09.989 STEP: clear resources Cluster @ 11/11/24 06:36:09.989 STEP: clear resources ClusterDefinition @ 11/11/24 06:36:10.032 STEP: clear resources ComponentDefinition @ 11/11/24 06:36:10.058 STEP: clear resources ComponentVersion @ 11/11/24 06:36:10.136 STEP: clear resources Component @ 11/11/24 06:36:10.172 STEP: clear resources PersistentVolumeClaim @ 11/11/24 06:36:10.198 STEP: clear resources Pod @ 11/11/24 06:36:10.201 STEP: clear resources Backup @ 11/11/24 06:36:10.203 STEP: clear resources BackupPolicy @ 11/11/24 06:36:10.206 STEP: clear resources VolumeSnapshot @ 11/11/24 06:36:10.263 STEP: clear resources Service @ 11/11/24 06:36:10.27 STEP: clear resources BackupPolicyTemplate @ 11/11/24 06:36:10.277 STEP: clear resources ActionSet @ 11/11/24 06:36:10.285 STEP: clear resources StorageClass @ 11/11/24 06:36:10.292 STEP: clean resources @ 11/11/24 06:36:10.295 STEP: clear resources Cluster @ 11/11/24 06:36:10.295 STEP: clear resources ClusterDefinition @ 11/11/24 06:36:10.304 STEP: clear resources ComponentDefinition @ 11/11/24 06:36:10.308 STEP: clear resources ComponentVersion @ 11/11/24 06:36:10.312 STEP: clear resources Component @ 11/11/24 06:36:10.316 STEP: clear resources PersistentVolumeClaim @ 11/11/24 06:36:10.345 STEP: clear resources Pod @ 11/11/24 06:36:10.351 STEP: clear resources Backup @ 11/11/24 06:36:10.357 STEP: clear resources BackupPolicy @ 11/11/24 06:36:10.364 STEP: clear resources VolumeSnapshot @ 11/11/24 06:36:10.383 STEP: clear resources Service @ 11/11/24 06:36:10.389 [PANICKED] in [AfterEach] - /home/runner/go/pkg/mod/github.com/onsi/gomega@v1.31.0/internal/async_assertion.go:321 @ 11/11/24 06:36:10.4 << Timeline [PANICKED] Test Panicked In [AfterEach] at: /home/runner/go/pkg/mod/github.com/onsi/gomega@v1.31.0/internal/async_assertion.go:321 @ 11/11/24 06:36:10.4 expected DeletionTimestamp is not nil, obj: {"metadata":{"name":"kubernetes","namespace":"default","uid":"fc241288-27cc-4897-bab9-fb2cf0d3e547","resourceVersion":"2397","creationTimestamp":"2024-11-11T06:36:10Z","labels":{"component":"apiserver","provider":"kubernetes"},"managedFields":[{"manager":"kube-apiserver","operation":"Update","apiVersion":"v1","time":"2024-11-11T06:36:10Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:component":{},"f:provider":{}}},"f:spec":{"f:clusterIP":{},"f:internalTrafficPolicy":{},"f:ipFamilyPolicy":{},"f:ports":{".":{},"k:{\"port\":443,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}}},"f:sessionAffinity":{},"f:type":{}}}}]},"spec":{"ports":[{"name":"https","protocol":"TCP","port":443,"targetPort":42583}],"clusterIP":"10.0.0.1","clusterIPs":["10.0.0.1"],"type":"ClusterIP","sessionAffinity":"None","ipFamilies":["IPv4"],"ipFamilyPolicy":"SingleStack","internalTrafficPolicy":"Cluster"},"status":{"loadBalancer":{}}} Full Stack Trace github.com/onsi/gomega/internal.(*AsyncAssertion).buildActualPoller.func3.1() /home/runner/go/pkg/mod/github.com/onsi/gomega@v1.31.0/internal/async_assertion.go:321 +0x1c5 panic({0x339c2c0?, 0xc0054e6ca0?}) /opt/hostedtoolcache/go/1.21.13/x64/src/runtime/panic.go:914 +0x21f github.com/apecloud/kubeblocks/pkg/testutil/apps.ClearResourcesWithRemoveFinalizerOption[...].func1() /home/runner/work/kubeblocks/kubeblocks/pkg/testutil/apps/common_util.go:338 +0x554 reflect.Value.call({0x33a8f40?, 0xc004af79e0?, 0x2540be400?}, {0x394aed3, 0x4}, {0xc004f5f920, 0x1, 0x76?}) /opt/hostedtoolcache/go/1.21.13/x64/src/reflect/value.go:596 +0xce7 reflect.Value.Call({0x33a8f40?, 0xc004af79e0?, 0xc00006e000?}, {0xc004f5f920?, 0x6cf1db0d3b?, 0x[380](https://github.com/apecloud/kubeblocks/actions/runs/11773368367/job/32790184670#step:6:381)ae40?}) /opt/hostedtoolcache/go/1.21.13/x64/src/reflect/value.go:380 +0xb9 github.com/onsi/gomega/internal.(*AsyncAssertion).buildActualPoller.func3() /home/runner/go/pkg/mod/github.com/onsi/gomega@v1.31.0/internal/async_assertion.go:325 +0x11f github.com/onsi/gomega/internal.(*AsyncAssertion).match(0xc0003ae9a0, {0x3d8d1d8?, 0x555c3e0}, 0x1, {0x0, 0x0, 0x0}) /home/runner/go/pkg/mod/github.com/onsi/gomega@v1.31.0/internal/async_assertion.go:398 +0x179 github.com/onsi/gomega/internal.(*AsyncAssertion).Should(0xc0003ae9a0, {0x3d8d1d8, 0x555c3e0}, {0x0, 0x0, 0x0}) /home/runner/go/pkg/mod/github.com/onsi/gomega@v1.31.0/internal/async_assertion.go:145 +0x86 github.com/apecloud/kubeblocks/pkg/testutil/apps.ClearResourcesWithRemoveFinalizerOption[...](0x5527a20, 0x1, 0x1?, {0xc0054e6a10, 0x1, 0x1}) /home/runner/work/kubeblocks/kubeblocks/pkg/testutil/apps/common_util.go:352 +0x565 github.com/apecloud/kubeblocks/controllers/apps.glob..func1.2() /home/runner/work/kubeblocks/kubeblocks/controllers/apps/cluster_controller_test.go:110 +0x55e github.com/apecloud/kubeblocks/controllers/apps.glob..func1.4() /home/runner/work/kubeblocks/kubeblocks/controllers/apps/cluster_controller_test.go:124 +0x13 ```

At first I thought it was because of a stale read. The failed code is:

g.Expect(testCtx.Cli.DeleteAllOf(testCtx.Ctx, PT(&obj), opts...)).ShouldNot(gomega.HaveOccurred())
g.Expect(testCtx.Cli.List(testCtx.Ctx, PL(&objList), listOptions...)).Should(gomega.Succeed())
items := reflect.ValueOf(&objList).Elem().FieldByName("Items").Interface().([]T)
for _, obj := range items {
    pobj := PT(&obj)
    if pobj.GetDeletionTimestamp().IsZero() {
        d, _ := json.Marshal(pobj)
        panic("expected DeletionTimestamp is not nil, obj: " + string(d))
    }

Although testCtx.cli here won't use client-go's cache, it still suffers from apiserver's stale cache. But after some investigation, I know that if resoureceVersion parameter is not set in the list request, it is guaranteed that returned data will be at the most recent resource version (i.e. read directly from etcd).

So I suspect it was because we have deleted the kubernetes service, and it would be recreated by apiserver after deleted. This behaviour can be verified by a simple test:

FIt("debug", func() {
  inNS := client.InNamespace(testCtx.DefaultNamespace)
  oldSvc := &corev1.Service{}
  nn := client.ObjectKey{Namespace: "default", Name: "kubernetes"}
  Expect(testCtx.Cli.Get(testCtx.Ctx, nn, oldSvc)).Should(Succeed())
  fmt.Println(oldSvc)
  testapps.ClearResourcesWithRemoveFinalizerOption(&testCtx, generics.ServiceSignature, true, inNS)
  Eventually(testapps.CheckObj(&testCtx, nn, func(g Gomega, svc *corev1.Service) {
      g.Expect(svc.DeletionTimestamp.IsZero()).Should(BeTrue())
      g.Expect(svc.UID).ShouldNot(Equal(oldSvc.UID))
      fmt.Println(svc)
  })).Should(Succeed())
})

So here's the fix.

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 61.40%. Comparing base (21fa706) to head (f7dd0e2). Report is 3 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #8440 +/- ## ========================================== + Coverage 60.97% 61.40% +0.43% ========================================== Files 351 351 Lines 41764 41855 +91 ========================================== + Hits 25465 25702 +237 + Misses 13987 13840 -147 - Partials 2312 2313 +1 ``` | [Flag](https://app.codecov.io/gh/apecloud/kubeblocks/pull/8440/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apecloud) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/apecloud/kubeblocks/pull/8440/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apecloud) | `61.40% <ø> (+0.43%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apecloud#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

wangyelei commented 4 days ago

/cherry-pick release-1.0.0-beta

github-actions[bot] commented 4 days ago

🤖 says: ‼️ cherry pick action failed.
See: https://github.com/apecloud/kubeblocks/actions/runs/11970358392

wangyelei commented 4 days ago

/cherry-pick release-1.0-beta

github-actions[bot] commented 4 days ago

🤖 says: cherry pick action finished successfully 🎉!
See: https://github.com/apecloud/kubeblocks/actions/runs/11970368921