FoundationDB / fdb-kubernetes-operator

A kubernetes operator for FoundationDB
Apache License 2.0
238 stars 81 forks source link

Fix race condition in e2e test suite when checking if a pod is deleted #2092

Closed johscheuer closed 5 days ago

johscheuer commented 6 days ago

Description

Fix race condition in e2e test suite when checking if a pod is deleted. The race condition can happen when a pod is deleted and in between those checks the operator was recreating the pod quick enough. The additional check for the pod's UID will fix that, if the fetched pod has a new UID, we know that the fetched pod is a different pod.

Type of change

Please select one of the options below.

Discussion

-

Testing

Manually ran tests.

Documentation

-

Follow-up

-

foundationdb-ci commented 6 days ago

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented 5 days ago

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

johscheuer commented 5 days ago

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 81adfb7
  • Duration 2:29:07
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
• [FAILED] [979.040 seconds]
Operator HA Upgrades when no remote storage processes are restarted [It] Upgrade from 7.1.57 to 7.3.33 [e2e, pr]
/codebuild/output/src4146627943/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/upgrade_test_configuration.go:115

  [FAILED] Unexpected error:
      <*fmt.wrapError | 0xc003919a40>: 
      timeout waiting for all clusters to be upgraded to 7.3.33, original error: timed out waiting for the condition
      {
          msg: "timeout waiting for all clusters to be upgraded to 7.3.33, original error: timed out waiting for the condition",
          err: <*errors.errorString | 0x29134d0>{
              s: "timed out waiting for the condition",
          },
      }
  occurred
  In [It] at: /codebuild/output/src4146627943/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/ha_fdb_cluster.go:314 @ 07/02/24 18:12:57.52
------------------------------

That test failure is unrelated. I'll dig into it.