At present, we are running the upstream Argo Rollouts E2E automated tests against argo-rollouts-manager PRs. With each PR, we:
A) install and start Argo Rollouts manager
B) clone the latest version of Argo Rollouts
C) Call `make test-e2e' in argo-rollouts repo to run the Argo Rollouts E2E tests, cloned from the previous step.
Argo Rollouts then runs the e2e tests via gotestsum, a utility which intelligently runs go automated tests (and can, for example, automatically retry tests).
D) Scan the results and ensure they pass.
When the Argo Rollouts tests run again our operator, most tests pass! But some fail: for example, here is a list of failures from a recent E2E test run:
TestControllerMetrics we expect to fail: the test expects that the Argo Rollouts controller is running locally (via make start-e2e), whereas in this case it's running in a Pod on the cluster.
However, it's not clear why TestBlueGreenPromoteFull is failing: I've glanced over the test and everything it's doing seems like it should work (and often does work, on first run).
So, this issue is to investigate why it's failing. This is also a good opportunity to dig in to Rollouts code, both the controller code and the test code.
To Reproduce:
To run a single upstream Rollouts E2E test, in hack/run-upstream-argo-rollouts-e2e-tests.sh:
Modify make test-e2e to E2E_TEST_OPTIONS="-run 'TestFunctionalSuite' -testify.m 'TestBlueGreenPromoteFull'" "until-fail.sh" make test-e2e
This will run the TestBlueGreenPromoteFull test over and over, until it fails.
Then run hack/run-upstream-argo-rollouts-e2e-tests.sh
Strangely, what I have seen is this TestBlueGreenPromoteFull will initially pass a few times, but after a few runs it will switch to always failing, 100% of the time.
At present, we are running the upstream Argo Rollouts E2E automated tests against argo-rollouts-manager PRs. With each PR, we:
When the Argo Rollouts tests run again our operator, most tests pass! But some fail: for example, here is a list of failures from a recent E2E test run:
(source)
TestControllerMetrics we expect to fail: the test expects that the Argo Rollouts controller is running locally (via
make start-e2e
), whereas in this case it's running in a Pod on the cluster.However, it's not clear why TestBlueGreenPromoteFull is failing: I've glanced over the test and everything it's doing seems like it should work (and often does work, on first run).
So, this issue is to investigate why it's failing. This is also a good opportunity to dig in to Rollouts code, both the controller code and the test code.
To Reproduce:
hack/run-upstream-argo-rollouts-e2e-tests.sh
:make test-e2e
toE2E_TEST_OPTIONS="-run 'TestFunctionalSuite' -testify.m 'TestBlueGreenPromoteFull'" "until-fail.sh" make test-e2e
TestBlueGreenPromoteFull
test over and over, until it fails.until-fail.sh
script: https://gist.github.com/jgwest/7048a765d398519837f990120cf3fdd0hack/run-upstream-argo-rollouts-e2e-tests.sh
Strangely, what I have seen is this
TestBlueGreenPromoteFull
will initially pass a few times, but after a few runs it will switch to always failing, 100% of the time.