Restarting 1-replica rollout causes stable to terminate immediately

bpoland commented 3 years ago

Summary

In a Rollout with 1 replica, triggering a restart causes the running pod to be terminated immediately at the same time that the new pod starts up. This happens even when explicitly setting maxUnavailable: 0 (the default is documented as 25% which I would have expected to be rounded down to 0 pods). This causes no pods to be available until the new one becomes ready.

I know 1 replica is risky anyways but triggering a restart of a Deployment with 1 pod does what I would expect: it starts a new pod, waits for it to become ready, then terminates the old pod.

Diagnostics

Rollouts 1.1

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

huikang commented 3 years ago

@bpoland , could you elaborate how triggering a restart to help reproduce the behavior? Is the new pod a new version? It would be helpful if some output from get rollout <rollout name> can be provided. Thanks.

bpoland commented 3 years ago

@bpoland , could you elaborate how triggering a restart to help reproduce the behavior? Is the new pod a new version? It would be helpful if some output from get rollout <rollout name> can be provided. Thanks.

Ah sorry -- I just mean manually triggering a restart of pods with kubectl argo rollouts restart <name> or the rollouts dashboard.

With a deployment it would be kubectl rollout restart deployment/<name>

Here's the output before starting:

# kubectl argo rollouts get rollout example
Name:            example
Namespace:       default
Status:          ✔ Healthy
Strategy:        Canary
  Step:
  SetWeight:     100
  ActualWeight:  100
Images:          repo/example:994dab92f26ef64d40a3249389f82a7843fa14b4 (stable)
Replicas:
  Desired:       1
  Current:       1
  Updated:       1
  Ready:         1
  Available:     1

NAME                                 KIND        STATUS        AGE  INFO
⟳ example                           Rollout     ✔ Healthy     2d
├──# revision:3
│  └──⧉ example-57b9f46fb7          ReplicaSet  ✔ Healthy     2h  stable
│     └──□ example-57b9f46fb7-sks58  Pod         ✔ Running     2h  ready:2/2
├──# revision:2
│  └──⧉ example-6cc8d67b74           ReplicaSet  • ScaledDown  1d
└──# revision:1
   └──⧉ example-5f87868bcc           ReplicaSet  • ScaledDown  2d

Then I run kubectl argo rollouts restart example and right afterwards:

# kubectl argo rollouts get rollout example
Name:            example
Namespace:       default
Status:          ◌ Progressing
Message:         updated replicas are still becoming available
Strategy:        Canary
  Step:
  SetWeight:     100
  ActualWeight:  100
Images:          repo/example:994dab92f26ef64d40a3249389f82a7843fa14b4 (stable)
Replicas:
  Desired:       1
  Current:       1
  Updated:       1
  Ready:         0
  Available:     0

NAME                                 KIND        STATUS             AGE  INFO
⟳ example                           Rollout     ◌ Progressing      2d
├──# revision:3
│  └──⧉ example-57b9f46fb7          ReplicaSet  ◌ Progressing      2h  stable
│     ├──□ example-57b9f46fb7-sks58  Pod         ◌ Terminating      2h  ready:2/2
│     └──□ example-57b9f46fb7-kh9kh  Pod           PodInitializing  4s   ready:0/2
├──# revision:2
│  └──⧉ example-6cc8d67b74           ReplicaSet  • ScaledDown       1d
└──# revision:1
   └──⧉ example-5f87868bcc           ReplicaSet  • ScaledDown       2d

As you can see, the running pod is immediately terminated and we are left with 0 ready/available pods until the new one starts.

perenesenko commented 3 years ago

Hi @jessesuen. Could you comment on this? Is that expected behavior. If not - how that could be implemented in the case of Rollout: do we need to provide some specific logic or just repeat the canary/bluegreen?

jessesuen commented 3 years ago

Yes. The documentation is clear on this: https://argoproj.github.io/argo-rollouts/features/restart/

Note: Unlike deployments, where a "restart" is nothing but a normal rolling upgrade that happened to be triggered by a timestamp in the pod spec annotation, Argo Rollouts facilitates restarts by terminating pods and allowing the existing ReplicaSet to replace the terminated pods. This design choice was made in order to allow a restart to occur even when a Rollout was in the middle of a long-running blue-green/canary update (e.g. a paused canary). However, some consequences of this are:

Restarting a Rollout which has a single replica will cause downtime since Argo Rollouts needs to terminate the pod in order to replace it.

Restarting a rollout will be slower than a deployment's rolling update, since maxSurge is not used to bring up newer pods faster.

maxUnavailable will be used to restart multiple pods at a time (starting in v0.10). But if maxUnavailable pods is 0, the controller will still restart pods one at a time.

jessesuen commented 3 years ago

I would love a way to handle this, but the hard part of this problem is we need to support restarts in the middle of canary & blue-green updates. This makes the problem much harder than rolling update's "restart" which is nothing but another update.

bpoland commented 3 years ago

Ah thanks, I had missed that in the docs. Just trying to understand a bit better -- why would respecting the maxSurge cause problems during a canary/blue-green update? It seems like removing and recreating pods would already change the balance of pods while it was happening.

jessesuen commented 3 years ago

Sorry I think I didn't convey a lot of context. So first it's important to understand that we have a requirement to perform rollout restarts in the middle of an update (e.g. a paused rollout).

Note that with the normal rollingUpdate strategy, a restart is just another update triggered via a pod annotation. This is how kubectl rollout restart DEPLOYMENT works, it simply annotates the pod template spec with a timestamp, and takes advantage of the fact that a new rolling update will happen which restarts all pods indirectly.

Rollouts cannot do this same trick with a timestamp in pod annotation. As with a Deployment, any change to the pod template spec is considered an update, but unlike Deployments, a Rollout update goes through canary steps, analysis, etc... This is not what we want to do if user is requesting just a restart of all their pods. So we had to invent a different mechanism to perform a restart. The mechanism is to evict/delete pods (going as low as maxUnavailable).

So because our restart mechanism is just deleting pods and allowing them to become recreated (as opposed to managing replicaset scale), we don't have a way to surge up -- at least not without complicating the code greatly.

Also, one technical limitation that we discovered about ReplicaSets is that if you scale a ReplicaSet up and then down, the ReplicaSet controller will choose the newest pod to delete when scaling down. In other words, if I have a ReplicaSet of size 5, and I want to restart all 5 pods, if I scale the ReplicaSet to 10 and then back down to 5, I will be left with the original 5 pods. This goes against our desire to restart the oldest pods (which have a creationTimestamp before the restart timestamp).

bpoland commented 3 years ago

Thanks for the detailed info, that's very helpful!

It does seem like it would be possible to scale up the ReplicaSet and then manually evict the oldest pods as it does now, but I can see how that would complicate the code. I also understand this is an edge case that isn't the highest priority :)

stek29 commented 2 years ago

kubectl-argorollout-restart:

#!/usr/bin/env bash
set -euo pipefail

if [ "$#" -ne "2" ]; then
  echo "usage: $0 NAMESPACE ROLLOUT"
  exit 2
fi

NAMESPACE="$1"
ROLLOUT="$2"

paused="$(kubectl get rollout -ojsonpath="{.spec.paused}" -n "$NAMESPACE" "$ROLLOUT")"
if [ "$paused" = "true" ]; then
  echo "can't restart paused rollout, unpause by removing .spec.paused=true first" >&2
  exit 1
fi

kubectl patch rollout --type merge -n "$NAMESPACE" "$ROLLOUT" -p "$(cat <<EOF
{"spec": { "template": { "metadata": { "annotations": {
  "kubectl.kubernetes.io/restartedAt": "$(date --rfc-3339=seconds)"
}}}}}
EOF
)"

jngbng commented 2 years ago

Hello, In my case, argo rollout restart causes random short downtime even with multiple replicas. It seems that traffic goes to killed pod for short time, which leads 500 error from ingress controller. (in my case AWS ingress controller). I guess kubectl.kubernetes.io/restartedAt feature might not support zero-downtime. (Stop forwarding traffic to pod, send kill signal to pod and wait for grace period, and kill. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)

I hope that argo rollout restart follows normal rollout process like rollout restart does. Restarting pods for whatever reasons (eg. hygiene purposes, reloading secrets) SHOULD be tested like other change operations. Since there can be a lot of unexpected problems: cache stampede, updated secret be invalid value, etc. Safe rather than sorry.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity.

kulaseba commented 1 year ago

Resolved it by using WorkloadRef to Deployment and controlling rolling restart via Deployment. Only down side is replica's are managed by rollout so the deployment is not aware how many replicas we are running on !! . But atleast we solve immediate terminate

AnhQKatalon commented 1 year ago

I am having the same issue with the Rollout that has 2 replicas also. When I use restart, the Rollout will terminate 1 current Stable running pod and create a new one simultaneously.

I expected the behavior should be the same as when I rollout a new version. The Canary pods will start first, after that the Stable pod will be terminated.

For Blue Green strategy, the logic works fine when I restart the rollout. New Blue pods will start first --> Switch traffic from Green to Blue --> Terminate Green pods

kulaseba commented 1 year ago

@AnhQKatalon : did you tried with workload ref to deployment ?

Hronom commented 4 months ago

Same here, how to make argo rollouts to restart pod one by one and not terminate old until new not become healthy

zimmertr commented 4 months ago

Define a PodDisruptionBudget. Or use canary with dynamicStableScale and incremental setWeight steps. Or use canary without steps but with maxUnavailable and maxSurge.

This story only applies to single-replica Rollouts. The behavior you want is already supported. https://argo-rollouts.readthedocs.io/en/stable/features/canary/#mimicking-rolling-update

yacine-1 commented 1 month ago

Hi, using a BlueGreen deployment with only 1 pod, the restart action via the ArgoCD UI kills my pod and recreates a new one, generating an unavailability. I'm looking to have the same behavior as with a classic Deployment object having a "RollingUpdate" strategy.

argoproj / argo-rollouts

Restarting 1-replica rollout causes stable to terminate immediately #1583

Summary

Diagnostics