Improve upstream Rollouts E2E test parsing, by allowing test retries

jgwest commented 4 months ago

What does this PR do / why we need it:

Improve upstream Rollouts E2E test parsing: replace the simple bash-based parsing with a short Go script.
The previous version of the code would just ignore retries.
The new version will allow parsing of retries, and will report a failure only if a test never succeeds (after 5 retries)

Have you updated the necessary documentation?

[ ] Documentation update is required by this PR, and has been updated.

Which issue(s) this PR fixes: N/A

chetan-rns commented 3 months ago

Is it expected that some of the tests fail against OpenShift? But everything works well in the CI against K3s.

jgwest commented 3 months ago

Thanks @chetan-rns!

Re: OpenShift, I ran it on a clusterbot cluster last night (rosa create 4.14) and these were the ones that failed for me:

These were the tests that failed:
    --- FAIL: TestAPISIXSuite/TestAPISIXCanarySetHeaderStep (4.52s)
    --- FAIL: TestAPISIXSuite/TestAPISIXCanarySetHeaderStep (4.57s)
    --- FAIL: TestAPISIXSuite/TestAPISIXCanarySetHeaderStep (4.65s)
    --- FAIL: TestAPISIXSuite/TestAPISIXCanarySetHeaderStep (4.67s)
    --- FAIL: TestAPISIXSuite/TestAPISIXCanarySetHeaderStep (4.79s)
    --- FAIL: TestAPISIXSuite/TestAPISIXCanarySetHeaderStep (6.37s)
    --- FAIL: TestExperimentSuite/TestExperimentWithDryRunMetrics (36.68s)
    --- FAIL: TestExperimentSuite/TestExperimentWithDryRunMetrics (36.92s)
    --- FAIL: TestExperimentSuite/TestExperimentWithDryRunMetrics (37.03s)
    --- FAIL: TestExperimentSuite/TestExperimentWithDryRunMetrics (37.51s)
    --- FAIL: TestExperimentSuite/TestExperimentWithDryRunMetrics (37.72s)
    --- FAIL: TestExperimentSuite/TestExperimentWithDryRunMetrics (38.24s)
    --- FAIL: TestFunctionalSuite/TestControllerMetrics (1.99s)
    --- FAIL: TestFunctionalSuite/TestControllerMetrics (2.01s)
    --- FAIL: TestFunctionalSuite/TestControllerMetrics (2.02s)
    --- FAIL: TestFunctionalSuite/TestControllerMetrics (2.04s)
    --- FAIL: TestFunctionalSuite/TestControllerMetrics (2.05s)

The *Metric tests fail because they expect Rollouts to be running locally to the test runner (whereas, in this case, it's running on the cluster). Not sure about the APISix test, haven't investigated it. Were these the failures you saw?

argoproj-labs / argo-rollouts-manager

Improve upstream Rollouts E2E test parsing, by allowing test retries #44