argoproj-labs / argo-rollouts-manager

Kubernetes Operator for Argo Rollouts controller.
https://argo-rollouts-manager.readthedocs.io/en/latest/
Apache License 2.0
84 stars 255 forks source link

Investigate why TestBlueGreenPromoteFull upstream E2E test is failing when run against Argo Rollouts installed by Argo Rollouts operator #48

Closed jgwest closed 2 months ago

jgwest commented 3 months ago

At present, we are running the upstream Argo Rollouts E2E automated tests against argo-rollouts-manager PRs. With each PR, we:

When the Argo Rollouts tests run again our operator, most tests pass! But some fail: for example, here is a list of failures from a recent E2E test run:

--- FAIL: TestAPISIXSuite/TestAPISIXCanarySetHeaderStep (0.48s)
--- FAIL: TestAPISIXSuite/TestAPISIXCanarySetHeaderStep (0.68s)
--- FAIL: TestAPISIXSuite/TestAPISIXCanarySetHeaderStep (0.69s)
--- FAIL: TestFunctionalSuite/TestBlueGreenPromoteFull (2.74s)
--- FAIL: TestFunctionalSuite/TestBlueGreenPromoteFull (2.85s)
--- FAIL: TestFunctionalSuite/TestBlueGreenPromoteFull (2.89s)
--- FAIL: TestFunctionalSuite/TestBlueGreenPromoteFull (2.90s)
--- FAIL: TestFunctionalSuite/TestBlueGreenPromoteFull (2.92s)
--- FAIL: TestFunctionalSuite/TestBlueGreenPromoteFull (3.25s)
--- FAIL: TestFunctionalSuite/TestControllerMetrics (0.13s)
--- FAIL: TestFunctionalSuite/TestControllerMetrics (0.17s)
--- FAIL: TestFunctionalSuite/TestControllerMetrics (0.18s)

(source)

TestControllerMetrics we expect to fail: the test expects that the Argo Rollouts controller is running locally (via make start-e2e), whereas in this case it's running in a Pod on the cluster.

However, it's not clear why TestBlueGreenPromoteFull is failing: I've glanced over the test and everything it's doing seems like it should work (and often does work, on first run).

So, this issue is to investigate why it's failing. This is also a good opportunity to dig in to Rollouts code, both the controller code and the test code.

To Reproduce:

Strangely, what I have seen is this TestBlueGreenPromoteFull will initially pass a few times, but after a few runs it will switch to always failing, 100% of the time.

jgwest commented 3 months ago

Red Hat external issue tracker: https://issues.redhat.com/browse/GITOPS-4212