envoyproxy / gateway

Manages Envoy Proxy as a Standalone or Kubernetes-based Application Gateway
https://gateway.envoyproxy.io
Apache License 2.0
1.55k stars 335 forks source link

gateway fails to re-create the proxy when the proxy deployment is deleted #3632

Closed christiancadieux closed 2 weeks ago

christiancadieux commented 3 months ago

Description:

I deleted the proxy manually with 'kubectl delete deploy envoy-tenant1-ns1-eg-33b93dd6' and it was never re-created.

saw this in the log

{"runner": "infrastructure", "error": "failed to create or update deployment tenant1-eg/envoy-tenant1-ns1-eg-33b93dd6: failed to create/update resource with server-side apply for obj 
&Deployment{ObjectMeta:{envoy-tenant1-ns1-eg-33b93dd6  tenant1-eg    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/component:proxy app.kubernetes.io/managed-by:envoy-gateway app.kubernetes.io/name:envoy gateway.envoyproxy.io/owning-gateway-name:eg gateway.envoyproxy.io/owning-gateway-namespace:tenant1-ns1] map[] [] [] 
[]},Spec:DeploymentSpec{Replicas:nil,Selector:&v1.LabelSelector{MatchLabels:map[string]string{app.kubernetes.io/component: proxy,app.kubernetes.io/managed-by: envoy-gateway,app.kubernetes.io/name: envoy,gateway.envoyproxy.io/owning-gateway-name: eg,gateway.envoyproxy.io/owning-gateway-namespace: tenant1-ns1,tsf.io/tenant: tenant1,},MatchExpressions:[]LabelSelectorRequirement{},},Template:{{      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/component:proxy app.kubernetes.io/managed-by:envoy-gateway app.kubernetes.io/name:envoy gateway.envoyproxy.io/owning-gateway-name:eg gateway.envoyproxy.io/owning-gateway-namespace:tenant1-ns1 tsf.io/tenant:tenant1] map[prometheus.io/path:/stats/prometheus prometheus.io/port:19001 prometheus.io/scrape:true] [] [] []} {[{certs {nil nil nil nil nil SecretVolumeSource{SecretName:envoy,Items:[]KeyToPath{},DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}} {sds {nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil &ConfigMapVolumeSource{LocalObjectReference:LocalObjectReference{Name:envoy-tenant1-ns1-eg-33b93dd6,},
Items:[]KeyToPath{KeyToPath{Key:xds-trusted-ca.json,Path:xds-trusted-ca.json,Mode:nil,},KeyToPath{Key:xds-certificate.json,Path:xds-certificate.json,Mode:nil,},},DefaultMode:*420,Optional:*false,} nil nil nil nil nil nil nil nil nil nil}}] [] [{envoy hub.comcast.net/k8s-eng/envoyproxy/envoy:distroless-dev [envoy] [--service-cluster tenant1-ns1/eg --service-node $(ENVOY_POD_NAME) --config-yaml admin:

 --log-level warn --cpuset-threads]  [{http-8080 0 8080 TCP } {metrics 0 19001 TCP }] [] [{ENVOY_GATEWAY_NAMESPACE  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {ENVOY_POD_NAME  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.name,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] {map[] map[cpu:{{100 -3} {<nil>} 100m DecimalSI} memory:{{536870912 0} {<nil>}  BinarySI}] []} [] <nil> [{certs true <nil> /certs  <nil> } {sds false <nil> /sds  <nil> }] [] nil &Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/ready,Port:{0 19001 },Host:,Scheme:HTTP,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,TerminationGracePeriodSeconds:nil,} nil &Lifecycle{PostStart:nil,PreStop:&LifecycleHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/shutdown/ready,Port:{0 19002 },Host:,Scheme:HTTP,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,Sleep:nil,},} /dev/termination-log File IfNotPresent nil false false false} {shutdown-manager hub.comcast.net/k8s-eng/envoyproxy/gateway-dev:latest [envoy-gateway] [envoy shutdown-manager]  [] [] [{ENVOY_GATEWAY_NAMESPACE  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {ENVOY_POD_NAME  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.name,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] {map[] map[cpu:{{10 -3} {<nil>} 10m DecimalSI} memory:{{33554432 0} {<nil>}  BinarySI}] []} [] <nil> [] [] &Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:{0 19002 },Host:,Scheme:HTTP,HTTPHeaders:
[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,TerminationGracePeriodSeconds:nil,} &Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:{0 19002 },Host:,Scheme:HTTP,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,TerminationGracePeriodSeconds:nil,} nil &Lifecycle{PostStart:nil,PreStop:&LifecycleHandler{Exec:&ExecAction{Command:[envoy-gateway envoy shutdown],},HTTPGet:nil,TCPSocket:nil,Sleep:nil,},} /dev/termination-log File IfNotPresent nil false false false}] [] Always 0xc0013bef10 <nil> ClusterFirst map[] envoy-tenant1-ns1-eg-33b93dd6  0xc0013bee2d  false false false <nil> nil []   nil default-scheduler [] []  <nil> nil [] <nil> <nil> <nil> map[] [] <nil> nil <nil> [] 
[]}},Strategy:DeploymentStrategy{Type:RollingUpdate,RollingUpdate:nil,},MinReadySeconds:0,RevisionHistoryLimit:*10,Paused:false,ProgressDeadlineSeconds:*600,},Status:DeploymentStatus{ObservedGeneration:0,Replicas:0,UpdatedReplicas:0,AvailableReplicas:0,UnavailableReplicas:0,Conditions:[]DeploymentCondition{},ReadyReplicas:0,CollisionCount:nil,},}: Deployment.apps \"envoy-tenant1-ns1-eg-33b93dd6\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/component\":\"proxy\", \"app.kubernetes.io/managed-by\":\"envoy-gateway\", \"app.kubernetes.io/name\":\"envoy\", \"gateway.envoyproxy.io/owning-gateway-name\":\"eg\", \"gateway.envoyproxy.io/owning-gateway-namespace\":\"tenant1-ns1\", \"tsf.io/tenant\":\"tenant1\"}, 
MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable"}

If I delete the deployment gateway and re-install - then the proxy deployment&pod comes-back:

NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/envoy-gateway                   1/1     1            1           5m5s
deployment.apps/envoy-tenant1-ns1-eg-33b93dd6   1/1     1            1           4m48s

NAME                                                 READY   STATUS    RESTARTS   AGE
pod/envoy-gateway-b8f7d4fc6-nxmff                    1/1     Running   0          5m5s
pod/envoy-tenant1-ns1-eg-33b93dd6-5fd47fc666-74v8j   2/2     Running   0          4m48s

$ k get gateway,httproute,envoyproxy -n tenant1-ns1
NAME                                   CLASS        ADDRESS   PROGRAMMED   AGE
gateway.gateway.networking.k8s.io/eg   eg-tenant1             False        30m

NAME                                          HOSTNAMES                     AGE
httproute.gateway.networking.k8s.io/backend   ["www.tenant1.example.com"]   24h

NAME                                                   AGE
envoyproxy.gateway.envoyproxy.io/custom-proxy-config   21m

this was a multi-namespace install:

TENANT=${TENANT:-tenant1}
NAMESPACES=${NAMESPACES:-tenant1-ns1,tenant1-ns2}

helm template gateway-helm \
    --set global.images.envoyGateway.pullPolicy="Always" \
    --set config.envoyGateway.gateway.controllerName=gateway.envoyproxy.io/${TENANT}-gatewayclass-controller \
    --set config.envoyGateway.provider.kubernetes.watch.namespaces={$NAMESPACES} \
    --set config.envoyGateway.provider.kubernetes.shutdownManager.image="hub.comcast.net/k8s-eng/envoyproxy/gateway-dev:latest" \
    --set deployment.envoyGateway.image.repository="hub.comcast.net/k8s-eng/envoyproxy/gateway-dev" \
    --set config.envoyGateway.provider.kubernetes.watch.type=Namespaces \
    --set deployment.pod.labels.tsf\\.io/tenant=tenant1 \
    --version v1.0.1 \
    --name-template eg-${TENANT} -n ${TENANT}-eg

Repro steps:

Include sample requests, environment, etc. All data and inputs required to reproduce the bug.

Note: If there are privacy concerns, sanitize the data prior to sharing.

Environment:

image: envoyproxy/envoy:distroless-dev
image: envoyproxy/gateway-dev:latest

Logs:

Include the access logs and the Envoy logs.

shawnh2 commented 3 months ago

what is your gatewayclass status ? do they got accepted ?

shawnh2 commented 3 months ago

we have multi-tenancy deployment doc, can you check and see if it helps ?


This problem won't happen according to above doc.

christiancadieux commented 3 months ago

I did follow the mult-tenant doc. I was expecting that since it's the gateway controller that created the envoy deployment in the first place, it should re-create it when it's deleted - but it failed with the mentioned error.

the status is:

status:
  conditions:
  - lastTransitionTime: "2024-06-19T22:09:45Z"
    message: 'Invalid parametersRef: failed to list envoyproxies in namespace tenant1-eg:
      unable to list: tenant1-eg because of unknown namespace for the cache'
    observedGeneration: 1
    reason: InvalidParameters
    status: "False"
    type: Accepted
christiancadieux commented 3 months ago

I notices that this 'field is immutable' error goes away if I remove any labels from the envoyproxy spec. But with the pod labels, the proxy image is never updated and I see the 'field is immutable' error in the logs.

FAILS:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: proxy-config-{{ .Values.tenant }}
  namespace: {{ .Values.tenant }}-ns1
spec:
  provider:
    type: Kubernetes
    kubernetes:
      envoyDeployment:
        pod:
           labels:
              tsf.io/tenant: {{ .Values.tenant }}
        container:
          image: {{ .Values.envoyproxyimage }}

but if I remove the pod section but leave the container.image - it all works.

arkodg commented 3 months ago

ah this ties to https://github.com/envoyproxy/gateway/issues/1844, we need a way to opt in to force recreation

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

arkodg commented 2 weeks ago

fixed with https://github.com/envoyproxy/gateway/pull/3995