FoundationDB / fdb-kubernetes-operator

A kubernetes operator for FoundationDB
Apache License 2.0
241 stars 82 forks source link

Investigate unit test flakiness in "remove instances command" #1078

Closed 09harsh closed 2 years ago

09harsh commented 2 years ago

What happened?

After raising PR for some other component this test failed. On rerunning the checks the test passed on its own. We need to investigate the root cause for this flakiness.

• Failure [0.013 seconds]
66
[plugin] remove instances command
67
/home/runner/work/fdb-kubernetes-operator/fdb-kubernetes-operator/kubectl-fdb/cmd/remove_process_group_test.go:40
68
  when running remove instances command
69
  /home/runner/work/fdb-kubernetes-operator/fdb-kubernetes-operator/kubectl-fdb/cmd/remove_process_group_test.go:60
70
    when removing instances from a cluster
71
    /home/runner/work/fdb-kubernetes-operator/fdb-kubernetes-operator/kubectl-fdb/cmd/remove_process_group_test.go:133
72
      should cordon all targeted processes
73
      /home/runner/work/fdb-kubernetes-operator/fdb-kubernetes-operator/kubectl-fdb/cmd/remove_process_group_test.go:260
74
        Remove all failed instances [It]
75
        /home/runner/work/fdb-kubernetes-operator/fdb-kubernetes-operator/kubectl-fdb/cmd/remove_process_group_test.go:330
76

77
        Expected
78
            <[]string | len:1, cap:1>: ["failed"]
79
        to contain elements
80
            <[]string | len:2, cap:2>: ["failed", "failed"]
81
        the missing elements were
82
            <[]string | len:1, cap:1>: ["failed"]
83

84
        /home/runner/work/fdb-kubernetes-operator/fdb-kubernetes-operator/kubectl-fdb/cmd/remove_process_group_test.go:276
85
------------------------------
86
•••••••••••••••••••••••••••••••••••••••2022/03/03 16:26:33 Could not find process for coordinator IP 127.0.0.1:4501
87
••••
88

89
Summarizing 1 Failure:
90

91
[Fail] [plugin] remove instances command when running remove instances command when removing instances from a cluster should cordon all targeted processes [It] Remove all failed instances 
92
/home/runner/work/fdb-kubernetes-operator/fdb-kubernetes-operator/kubectl-fdb/cmd/remove_process_group_test.go:276
93

What did you expect to happen?

All unit test running without failure.

How can we reproduce it (as minimally and precisely as possible)?

I discovered this after raising a PR.

Anything else we need to know?

No response

FDB Kubernetes operator

Kubernetes version

Cloud provider

johscheuer commented 2 years ago

Here is another occurrence of. this bug: https://github.com/FoundationDB/fdb-kubernetes-operator/runs/5575318465?check_suite_focus=true + https://github.com/FoundationDB/fdb-kubernetes-operator/runs/5521840618?check_suite_focus=true