FoundationDB / fdb-kubernetes-operator

A kubernetes operator for FoundationDB
Apache License 2.0
241 stars 82 forks source link

Improve the tester handling when a cluster is upgraded #2130

Closed johscheuer closed 2 weeks ago

johscheuer commented 3 weeks ago

Description

A cluster that contains tester processes can block the upgrade. The changes in this PR will change this and ignore tester processes in the pending upgrade check as those processes are reporting to the cluster but cannot be restarted with fdbcli.

Type of change

Discussion

I added a new e2e test for this setup.

Testing

Ran the test manually.

Documentation

-

Follow-up

-

foundationdb-ci commented 3 weeks ago

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented 3 weeks ago

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented 3 weeks ago

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented 3 weeks ago

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented 3 weeks ago

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented 3 weeks ago

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented 3 weeks ago

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

johscheuer commented 3 weeks ago

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 973e6ae
  • Duration 2:43:53
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
Summarizing 1 Failure:
  [FAIL] Operator HA Upgrades when no remote storage processes are restarted [It] Upgrade from 7.1.63 to 7.3.43 [e2e, pr]
  /codebuild/output/src714216153/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/fixtures/ha_fdb_cluster.go:314

Ran 8 of 10 Specs in 5763.040 seconds
FAIL! -- 7 Passed | 1 Failed | 2 Pending | 0 Skipped
--- FAIL: TestOperatorHaUpgrade (5851.08s)
FAIL
FAIL    github.com/FoundationDB/fdb-kubernetes-operator/e2e/test_operator_ha_upgrades   5851.102s
FAIL

That's another test that failed. I'll spend some time next week looking into the test stability.