Open thecosmicfrog opened 2 days ago
Can you share the helm charts, please? In particular, container spec.
@andrii-korotkov-verkada Yes, of course. You can find the chart here: https://github.com/thecosmicfrog/helm-charts/tree/main/charts%2Ftest-app
There are also corresponding Git tags for 0.0.1
and 0.0.2
.
When this happens, can you looks at the current vs desired manifest, please? It's probably a bug in removing webhook fields for comparison (unless you actually have a webhook in the cluster that does it), but worth a shot.
@andrii-korotkov-verkada The diff seems to be in a bit of a "confused" state from my viewing. See screenshots below.
Deployment
object shows the resources
block, as expected.resources
block (again, as expected).I hope that helps. Let me know what additional information I can provide.
Can you copy-paste whole manifests, please? Sorry for bugging you, it's just I'm looking for things out of line and need full manifests to check that.
@andrii-korotkov-verkada No problem at all! Here's the Deployment
manifests.
Live Manifest (managed fields unhidden):
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: '3'
creationTimestamp: '2024-11-13T17:14:51Z'
generation: 3
labels:
app.kubernetes.io/instance: test-app
argocd.argoproj.io/instance: test-app
managedFields:
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
f:app.kubernetes.io/instance: {}
f:argocd.argoproj.io/instance: {}
f:spec:
f:minReadySeconds: {}
f:progressDeadlineSeconds: {}
f:replicas: {}
f:revisionHistoryLimit: {}
f:selector: {}
f:template:
f:metadata:
f:labels:
f:app.kubernetes.io/instance: {}
f:spec:
f:containers:
k:{"name":"http-echo"}:
.: {}
f:env:
k:{"name":"PORT"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"VERSION"}:
.: {}
f:name: {}
f:value: {}
f:image: {}
f:imagePullPolicy: {}
f:livenessProbe:
f:httpGet:
f:path: {}
f:port: {}
f:name: {}
f:ports:
k:{"containerPort":5678,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:name: {}
f:protocol: {}
f:readinessProbe:
f:httpGet:
f:path: {}
f:port: {}
f:resources:
f:requests:
f:cpu: {}
f:memory: {}
f:securityContext:
f:allowPrivilegeEscalation: {}
f:capabilities:
f:drop: {}
f:privileged: {}
f:startupProbe:
f:httpGet:
f:path: {}
f:port: {}
f:hostIPC: {}
f:hostNetwork: {}
f:hostPID: {}
f:securityContext:
f:fsGroup: {}
f:runAsGroup: {}
f:runAsNonRoot: {}
f:runAsUser: {}
f:seccompProfile:
f:type: {}
f:terminationGracePeriodSeconds: {}
manager: argocd-controller
operation: Apply
time: '2024-11-14T14:43:51Z'
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:deployment.kubernetes.io/revision: {}
f:status:
f:availableReplicas: {}
f:conditions:
.: {}
k:{"type":"Available"}:
.: {}
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"Progressing"}:
.: {}
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
f:observedGeneration: {}
f:readyReplicas: {}
f:replicas: {}
f:updatedReplicas: {}
manager: kube-controller-manager
operation: Update
subresource: status
time: '2024-11-14T14:44:34Z'
name: test-app
namespace: sandbox-aaron
resourceVersion: '200923886'
uid: 7b7349f2-9000-4fe3-a443-9eb4e1a1a659
spec:
minReadySeconds: 10
progressDeadlineSeconds: 300
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: test-app
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: test-app
spec:
containers:
- env:
- name: VERSION
value: 0.0.1
- name: PORT
value: '5678'
image: hashicorp/http-echo
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: 5678
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: http-echo
ports:
- containerPort: 5678
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 5678
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 100m
memory: 32Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
startupProbe:
failureThreshold: 3
httpGet:
path: /
port: 5678
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
terminationGracePeriodSeconds: 80
status:
availableReplicas: 2
conditions:
- lastTransitionTime: '2024-11-14T06:03:11Z'
lastUpdateTime: '2024-11-14T06:03:11Z'
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: 'True'
type: Available
- lastTransitionTime: '2024-11-13T17:14:51Z'
lastUpdateTime: '2024-11-14T14:44:34Z'
message: ReplicaSet "test-app-74dfc69c76" has successfully progressed.
reason: NewReplicaSetAvailable
status: 'True'
type: Progressing
observedGeneration: 3
readyReplicas: 2
replicas: 2
updatedReplicas: 2
Desired Manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/instance: test-app
argocd.argoproj.io/instance: test-app
name: test-app
namespace: sandbox-aaron
spec:
minReadySeconds: 10
progressDeadlineSeconds: 300
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: test-app
template:
metadata:
labels:
app.kubernetes.io/instance: test-app
spec:
containers:
- env:
- name: VERSION
value: 0.0.2
- name: PORT
value: '5678'
image: hashicorp/http-echo
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /
port: 5678
name: http-echo
ports:
- containerPort: 5678
name: http
protocol: TCP
readinessProbe:
httpGet:
path: /
port: 5678
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
startupProbe:
httpGet:
path: /
port: 5678
hostIPC: false
hostNetwork: false
hostPID: false
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
terminationGracePeriodSeconds: 80
Let me know if you'd like me to re-post with the managed fields hidden, or anything else. Thanks!
Hm, I don't see anything obviously wrong at the moment.
Hm, I don't see anything obviously wrong at the moment.
Indeed. Notably, setting IncludeMutationWebhook=true
in the SyncOptions appears to resolve the issue, but this doesn't seem like it should be necessary for such a simple change (removal of a block from containers[]
)? Hence why I'm hesitant to proceed with setting that flag.
Are you able to reproduce on your side? I believe the instructions and charts I provided should be enough to do so, but please advise if I can provide anything else.
One more thing - do you have mutating webhooks setup in the cluster?
We have three MutatingWebhooks:
aws-load-balancer-webhook
pod-identity-webhook
vpc-resource-mutating-webhook
As I understand it, all are a part of Amazon EKS and AWS Load Balancer Controller.
Hi @andrii-korotkov-verkada. I have some additional information which should help you in finding the root cause of this.
I use Argo Rollouts for most of my applications (thus I use Rollout
objects instead of Deployment
). But the original error was triggered for an app using a Deployment
and thus is what I used in my reproduction Helm charts.
Out of curiosity, I decided to see if the same error would trigger when using Rollout
. I figured it would, since that is mostly a drop-in replacement for Deployment
but, to my surprise, it seems to work without issue!
Please see my latest chart versions:
0.3.1
: This chart is essentially the same as 0.0.1
, but using a Rollout
instead of a Deployment
.0.3.2
: This is 0.3.1
with the resources
block removed from spec.template.spec.containers[0]
.See the code for 0.3.1 and 0.3.2 here. I had to add two very basic Service
objects as this is required by Argo Rollouts, but you can likely just ignore them.
The chart artifacts are built and uploaded as before, so you can simply update spec.source.targetRevision
in the application.yaml
file I provided in the original post and kubectl apply
.
I hope this helps.
Thanks - Aaron
Checklist:
argocd version
.Describe the bug
I am seeing an error in Argo CD when upgrading a Helm chart from one version to another. The only difference between the Helm chart versions is that the new version removes the
resources
block fromspec.template.spec.containers[0]
in theDeployment
object. I have noticed that removing other blocks (e.g.env
) results in the same issue (so it is not just aresources
problem).The specific error is:
Additional details:
ServerSideApply=true
.ServerSideDiff
is enforced on the server side by settingcontroller.diff.server.side: true
in theargo-helm
chart values.To Reproduce
I have built a Helm chart to reproduce this, with two versions (
0.0.1
and0.0.2
). Updating will not work, as you will see.Prerequisites:
kube-system
namespace.controller.diff.server.side: true
).bug
(to host the k8s objects).Create a file called
application.yaml
to represent our Argo CDApplication
object:Apply this to the cluster:
kubectl apply -n kube-system -f application.yaml
Pods should come up without issue and everything should be correctly synced - as expected for a simple
Deployment
.Update
application.yaml
to bump the Helm chart to the new version:Apply this to the cluster:
kubectl apply -n kube-system -f application.yaml
Expected behavior
The new chart version should install without error, as it is such a straightforward change (
resources
block removed fromcontainers[0]
block).Actual behavior
Sync Status enters an Unknown state with the new chart version, and App Conditions displays 1 Error. That error is:
The only way to seemingly complete the update is to manually sync, which works without issue. We have Auto Sync enabled, so I'm not sure why that does not resolve the issue.
Screenshots
Argo CD UI after applying chart
0.0.2
:Several (5-ish?) minutes later - with no intervention from me - error "appears" to be resolved... but note the versions are not matching between both sync fields:
Then, clicking Refresh...
...results in the same 1 Error outcome as before...
...and the Pods present on the cluster are still from the "old" (
0.0.1
) chart version:The only way to fix is to manually Sync:
Which finally brings the app into sync at
0.0.2
:Version
Logs
Let me know if the above is enough information to reproduce the issue.
Thanks for your time - Aaron