gardener / etcd-backup-restore

Collection of components to backup and restore the etcd of a Kubernetes cluster.
Apache License 2.0
287 stars 100 forks source link

[Flaky Test] ☂️ Issue for all flaky tests #398

Open timuthy opened 3 years ago

timuthy commented 3 years ago

How to categorize this issue? /area testing /kind flake

Which test(s)/suite(s) are flaking:

time="2021-10-27T08:32:00Z" level=info msg="Defragmenting etcd member[127.0.0.1:39459]" job=defragmentor suite=defragmentor

time="2021-10-27T08:32:00Z" level=info msg="Finished defragmenting etcd member[127.0.0.1:39459]" job=defragmentor suite=defragmentor

time="2021-10-27T08:32:00Z" level=info msg="Probable DB size change for etcd member [127.0.0.1:39459]: 376832B -> 372736B after defragmentation" job=defragmentor suite=defragmentor


• Failure [14.864 seconds]

Defrag Defragmentation [It] should defragment and reduce size of DB within time

/tmp/build/bc4dbee3/pull-request-gardener.etcd-backup-restore-pr.master/pkg/defragmentor/defrag_test.go:67

Expected

  <int64>: 376832

to be <

  <int64>: 376832

/tmp/build/bc4dbee3/pull-request-gardener.etcd-backup-restore-pr.master/pkg/defragmentor/defrag_test.go:89

Full Stack Trace

github.com/gardener/etcd-backup-restore/pkg/defragmentor_test.glob..func1.2.2()

/tmp/build/bc4dbee3/pull-request-gardener.etcd-backup-restore-pr.master/pkg/defragmentor/defrag_test.go:89 +0xa69

github.com/gardener/etcd-backup-restore/pkg/defragmentor_test.TestDefragmentor(0xc000643980)

/tmp/build/bc4dbee3/pull-request-gardener.etcd-backup-restore-pr.master/pkg/defragmentor/defragmentor_suite_test.go:47 +0x109

testing.tRunner(0xc000643980, 0x2936380)

/usr/local/go/src/testing/testing.go:1193 +0x203

created by testing.(*T).Run

/usr/local/go/src/testing/testing.go:1238 +0x5d8

Logs
time="2022-09-05T13:00:08Z" level=info msg="GC: Total number garbage collected snapshots: 5236" actor=snapshotter suite=snapshotter
time="2022-09-05T13:00:12Z" level=info msg="GC: Stop signal received. Closing garbage collector." actor=snapshotter suite=snapshotter
[BeforeEach] Snapshotter
  /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:57
[BeforeEach] ##GarbageCollector
  /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:343
[It] should garbage collect exponentially
  /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:352

------------------------------

• Failure [39.530 seconds]
Snapshotter running snapshotter ##GarbageCollector [It] should garbage collect exponentially 
/tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:352

  Expected
      : 26
  to equal
      : 27

  /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:383

  Full Stack Trace
  github.com/gardener/etcd-backup-restore/pkg/snapshot/snapshotter_test.glob..func5.3.3.2()
    /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:383 +0x8e5
  github.com/gardener/etcd-backup-restore/pkg/snapshot/snapshotter_test.TestSnapshotter(0x0)
    /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_suite_test.go:45 +0x105
  testing.tRunner(0xc00017c820, 0x2ac9360)
    /usr/local/go/src/testing/testing.go:1259 +0x230
  created by testing.(*T).Run
    /usr/local/go/src/testing/testing.go:1306 +0x727

Logs

time="2022-06-07T08:00:10Z" level=info msg="GC: Total number garbage collected snapshots: 5211" actor=snapshotter suite=snapshotter
time="2022-06-07T08:00:14Z" level=info msg="GC: Stop signal received. Closing garbage collector." actor=snapshotter suite=snapshotter
[BeforeEach] Snapshotter
  /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:57
[BeforeEach] ##GarbageCollector
  /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:343
[It] should garbage collect exponentially with only v1 dir structure present (backward compatible test)
  /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:440

------------------------------

• Failure [33.257 seconds]
Snapshotter running snapshotter ##GarbageCollector [It] should garbage collect exponentially with only v1 dir structure present (backward compatible test) 
/tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:440

  Expected
      : 21
  to equal
      : 22

  /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:471

  Full Stack Trace
  github.com/gardener/etcd-backup-restore/pkg/snapshot/snapshotter_test.glob..func5.3.3.4()
    /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_test.go:471 +0x8e5
  github.com/gardener/etcd-backup-restore/pkg/snapshot/snapshotter_test.TestSnapshotter(0x0)
    /tmp/build/bc4dbee3/git-gardener.etcd-backup-restore-master.master/pkg/snapshot/snapshotter/snapshotter_suite_test.go:45 +0x105
  testing.tRunner(0xc000141a00, 0x2a99d08)
    /usr/local/go/src/testing/testing.go:1259 +0x230
  created by testing.(*T).Run
    /usr/local/go/src/testing/testing.go:1306 +0x727
ishan16696 commented 3 years ago

It is a known flaky test case and it never occurs when you run test cases locally, it always occurs in pipeline.

timuthy commented 3 years ago

It probably occurs due to CPU throttling or in general less computing power during the pipeline runs. However, it'd be beneficial to improve this test and eliminate its flakiness because it blocks PRs and releases on a regular basis.

ishan16696 commented 3 years ago

It probably occurs due to CPU throttling or in general less computing power during the pipeline runs

ok, if that’s the case then migrating the unit test to use gomock pkg will solve this issue.

ishan16696 commented 2 years ago

Updated the issue to track all flaky tests in backup-restore.