Open taer opened 16 hours ago
I've traced the logic down to here:
https://github.com/argoproj/argo-rollouts/blob/master/rollout/trafficrouting/istio/istio.go#L1501
func (r *Reconciler) RemoveManagedRoutes() error {
This smells like it wants to reset the managed routes. it's called from reconcileTrafficRouting
https://github.com/argoproj/argo-rollouts/blob/master/rollout/trafficrouting.go#L176
} else if rolloututil.IsFullyPromoted(c.rollout) {
err := reconciler.RemoveManagedRoutes()
if err != nil {
return err
}
I don't see any logs about RemoveManagedRoutes
getting called, and that's about as far as I can make it. I'm wondering if the IsFullyPromoted is saying false ro.Status.StableRS == ro.Status.CurrentPodHash
in my rollout status, they are equal. Of course, so is the canary since it's complete
status:
HPAReplicas: 1
availableReplicas: 1
blueGreen: {}
canary:
weights:
canary:
podTemplateHash: 689f7f6654
serviceName: httpbin-canary
weight: 0
stable:
podTemplateHash: 689f7f6654
serviceName: httpbin
weight: 100
conditions:
- lastTransitionTime: '2024-09-26T20:19:49Z'
lastUpdateTime: '2024-09-26T20:19:49Z'
message: Rollout has minimum availability
reason: AvailableReason
status: 'True'
type: Available
- lastTransitionTime: '2024-09-26T20:39:08Z'
lastUpdateTime: '2024-09-26T20:39:08Z'
message: RolloutCompleted
reason: RolloutCompleted
status: 'True'
type: Completed
- lastTransitionTime: '2024-09-26T20:39:08Z'
lastUpdateTime: '2024-09-26T20:39:08Z'
message: Rollout is paused
reason: RolloutPaused
status: 'False'
type: Paused
- lastTransitionTime: '2024-09-26T20:39:39Z'
lastUpdateTime: '2024-09-26T20:39:39Z'
message: Rollout is healthy
reason: RolloutHealthy
status: 'True'
type: Healthy
- lastTransitionTime: '2024-09-26T20:39:08Z'
lastUpdateTime: '2024-09-26T20:39:39Z'
message: ReplicaSet "httpbin-689f7f6654" has successfully progressed.
reason: NewReplicaSetAvailable
status: 'True'
type: Progressing
currentPodHash: 689f7f6654
currentStepHash: 84d9c65666
currentStepIndex: 2
observedGeneration: '6'
phase: Healthy
readyReplicas: 1
replicas: 1
selector: app=httpbin
stableRS: 689f7f6654
updatedReplicas: 1
Checklist:
Describe the bug We use rollouts with istio traffic management. Initially, when argo-rollouts changed the weights in the virtual service, argocd would fight it, causing sync flapping. We used to have auto-sync on, so this was a problem. To remedy, we added this to the application
This solved the sync issue.
Now, when a rollout occurs, Rollouts correctly modifies the VirtualService weights for the canary, and argo doesn't fight it because rollouts set this in the VS prior to changing the weights
All is well for the rollout. However, once one rollout happens, that managedField remains in the virtualService, making Argocd never see any changes to the VS unless we manually remove that managed field. Then it instantly sees it
To Reproduce reproducible deploy here: https://github.com/taer/argo-rollout-issue/tree/master
use the linked application. Deploy it. No VS changes occur. Change the VS http section. I changed the prefix
Prior to a rollout, changing this results in an expected diff. Now change the pod template. I edited
to force a rollout. ArgoRollouts starts, deploys the new pod, then adds the ManagedField to the VS and starts to alter the weights for the canary deploy. When complete, the managed fields still exist. Now this means any future changes to the VirtualService in git are ignored.
Expected behavior
I expect the managedField argo rollouts inserted to be removed when the rollout is complete so that future virtual service changes can be noticed by argocd.
Version Argo-rollouts: 1.7.2 argo-rollouts-helm: 2.37.5 istio: 1.22.3
Logs
Logs are in the github repro repo. https://github.com/taer/argo-rollout-issue/blob/master/errorLog.log
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.