The smoke tests have often been failing, since switching to the new operator. Reproduced this locally. Failing because the test was scaling the deployment to 3, but the new operator would then scale this back down to 1 (to be reconciled with the stateful service settings), the deployment availability check would pass, and the test would be using a deployment with terminating pods. Updated to scale the stateful service resource now, instead of the deployment directly. And added a check that we've actually scaled up to 3 replicas.
When trying this out, also had the service unavailable for the first request sometimes (connection refused). Added a retry for initial requests in case we see this in CI too.
Let's see if the smoke tests are more stable now...
The smoke tests have often been failing, since switching to the new operator. Reproduced this locally. Failing because the test was scaling the deployment to 3, but the new operator would then scale this back down to 1 (to be reconciled with the stateful service settings), the deployment availability check would pass, and the test would be using a deployment with terminating pods. Updated to scale the stateful service resource now, instead of the deployment directly. And added a check that we've actually scaled up to 3 replicas.
When trying this out, also had the service unavailable for the first request sometimes (connection refused). Added a retry for initial requests in case we see this in CI too.
Let's see if the smoke tests are more stable now...