The KAT test can fail in CI because they timeout or never pass readiness checks, etc... This make engineers assume that CI is flaking and adds a ton of time to the release process when it fails for this reason. This also waste time because we will think it is a flake so retry the Github action and wait 20+ minutes to realize that it is an actual error.
Goal
When CI fails it is either because docker registry has a temporary blip or we have an actual error
2.y branch has these improvements too
Note: there was some recent PR's that landed on master that have made this better. So this might not be too far off but let's address any other issues that we see and I think we should back port these as well.
Summary
The KAT test can fail in CI because they timeout or never pass readiness checks, etc... This make engineers assume that CI is flaking and adds a ton of time to the release process when it fails for this reason. This also waste time because we will think it is a flake so retry the Github action and wait 20+ minutes to realize that it is an actual error.
Goal
Note: there was some recent PR's that landed on master that have made this better. So this might not be too far off but let's address any other issues that we see and I think we should back port these as well.