Open ajax-bychenok-y opened 1 month ago
I think this is fixed in 1.7.2 or atleast improved can you try.
This is log for v1.7.1+6a99ea9 (soon we are going to update to the latest version as advised) rollouts-fail-old-replica.json
Most interesting thing is
{"level":"info","msg":"Set 'scale-down-deadline' annotation on 'some-svc-55468fc7cb' to 2024-09-19T09:35:41Z (30s)","namespace":"staging-a","rollout":"some-svc","time":"2024-09-19T09:35:11Z"}
{"level":"info","msg":"synced ephemeral metadata nil to Pod some-svc-55468fc7cb-lkr5l","namespace":"staging-a","rollout":"some-svc","time":"2024-09-19T09:35:12Z"}
{"level":"info","msg":"synced ephemeral metadata nil to Pod some-svc-55468fc7cb-p25vp","namespace":"staging-a","rollout":"some-svc","time":"2024-09-19T09:35:12Z"}
{"level":"info","msg":"Conflict when updating replicaset some-svc-55468fc7cb, falling back to patch","namespace":"staging-a","rollout":"some-svc","time":"2024-09-19T09:35:12Z"}
{"level":"info","msg":"Patching replicaset with patch: {\"metadata\":{\"annotations\":{\"rollout.argoproj.io/desired-replicas\":\"2\",\"rollout.argoproj.io/revision\":\"235\",\"scale-down-deadline\":\"\"},\"labels\":{\"rollouts-pod-template-hash\":\"55468fc7cb\"}},\"spec\":{\"replicas\":2,\"selector\":{\"matchLabels\":{\"rollouts-pod-template-hash\":\"55468fc7cb\"}},\"template\":{\"metadata\":{\"annotations\":{\"ad.datadoghq.com/some-svc.checks\":\"{\\n \\\"jmx\\\": {\\n \\\"init_config\\\": {\\n \\\"is_jmx\\\": true,\\n \\\"collect_default_metrics\\\": true,\\n \\\"collect_default_jvm_metrics\\\": true,\\n \\\"new_gc_metrics\\\": true\\n },\\n \\\"instances\\\": [{\\n \\\"host\\\": \\\"%%host%%\\\",\\n \\\"port\\\": 8855\\n }]\\n }\\n}\\n\"},\"labels\":{\"app.kubernetes.io/instance\":\"some-svc\",\"app.kubernetes.io/managed-by\":\"Helm\",\"app.kubernetes.io/name\":\"some-svc\",\"env_name\":\"staging\",\"env_tag\":\"a\",\"helm.sh/chart\":\"some-svc-0.26.0-773.RELEASE\",\"rollouts-pod-template-hash\":\"55468fc7cb\"}}}}}","namespace":"staging-a","rollout":"some-svc","time":"2024-09-19T09:35:12Z"}
{"level":"info","msg":"synced ephemeral metadata nil to ReplicaSet some-svc-55468fc7cb","namespace":"staging-a","rollout":"some-svc","time":"2024-09-19T09:35:12Z"}
{"generation":485,"level":"info","msg":"No status changes. Skipping patch","namespace":"staging-a","resourceVersion":"185480139","rollout":"some-svc","time":"2024-09-19T09:35:12Z"}
{"generation":485,"level":"info","msg":"Reconciliation completed","namespace":"staging-a","resourceVersion":"185480139","rollout":"some-svc","time":"2024-09-19T09:35:12Z","time_ms":74.843767}
As result it sets argo-rollouts.argoproj.io/scale-down-deadline
to ''
and old replica set never goes down.
Checklist:
Describe the bug
Sometimes Argo Rollouts switches release to new version (replicaset) but don't remove old one, so pods are always running. After some digging into it I've realized that the reason of that is blank value of annotation
argo-rollouts.argoproj.io/scale-down-deadline: ""
but correct date should be there. That's why controller can't remove it later.To Reproduce
I have no reproduce steps for this problem because it accidentally occurs after rollout process. Here is the process of my digging
https://github.com/argoproj/argo-rollouts/issues/1761#issuecomment-2331739689 https://github.com/argoproj/argo-rollouts/issues/1761#issuecomment-2332187024
Expected behavior
Controller shoud remove pods in non-active replica.
Screenshots
svc-c6dcc48cb
is still alive when newer replicasetsvc-865f9fcf88
was buried and even newer onesvc-74ff5588fb
is currently working.Version
app version: v1.7.1+6a99ea9 helm version: "2.37.1"
Logs
Have no logs for now.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.