aws / aws-application-networking-k8s

A Kubernetes controller for Amazon VPC Lattice
https://www.gateway-api-controller.eks.aws.dev/
Apache License 2.0
163 stars 47 forks source link

Gateway and HTTPRoute unable to be deleted -- stuck finalizing #115

Open ellistarn opened 1 year ago

ellistarn commented 1 year ago

Reproduction steps:

  1. Create gateway and gateway class
  2. Delete gateway class
  3. Delete gateway (stuck deleting)
  4. Delete HTTPRoute (stuck deleting)
  5. Observe finalizer still on resource.

This log line looks like the culprit.

gateway-api-controller-6d499b898c-nq4sr manager 2023-03-05T01:56:33.540Z    INFO    controller.gateway  Ignore it since not link to any gatewayclass    {"reconciler group": "gateway.networking.k8s.io", "reconciler kind": "Gateway", "name": "carverspring-2-v1bayfdckt", "namespace": "default"}
- apiVersion: gateway.networking.k8s.io/v1alpha2
  kind: HTTPRoute
  metadata:
    creationTimestamp: "2023-03-03T23:47:55Z"
    deletionGracePeriodSeconds: 0
    deletionTimestamp: "2023-03-04T02:28:55Z"
    finalizers:
    - httproute.k8s.aws/resources
aaroniscode commented 1 year ago

This may be due to #105 . I'm working on the PR now to add the finalizer for GatewayClass

liwenwu-amazon commented 1 year ago

Today, we expect amazonp-vpc-lattice GatewayClass is an object managed by Infrastructure provider and kind of like the capability of a EKS cluster. We are not expecting a cluster operator to be able to delete it.

Especially, we want to prevent it from getting deleted (#105) and all the VPC lattice resource get accidently wiped outs.

aaroniscode commented 1 year ago

Hi @liwenwu-amazon -- I'm not sure I understand how we would prevent a resource from being deleted. And as I mentioned in #105 , the community is likely not moving forward with finalizers for GatewayClass so we would want a mechanism to prevent system hang in case someone deletes the GatewayClass.

One alternative is to have the controller re-create the GatewayClass if deleted?

liwenwu-amazon commented 1 year ago

Not sure what the semantic meaning of deleting amazon-vpc-lattice gatewayclass? Does it mean the EKS cluster no longer support VPC lattice?

liwenwu-amazon commented 1 year ago

I am hoping in long-run, the lattice controller will be an automatic EKS-cluster add-on. And EKS-cluster always support AWS VPC lattice. AWS EKS infrastructure automatically create amazon-vpc-lattice gateway class

ellistarn commented 1 year ago

The kubernetes pattern is typically to allow arbitrary resource application and deletion, and handle these issues at runtime. It's common for use cases to install or uninstall yaml in any order, and validating the existence of interdependent resources will break many flows (e.g. helm, flux). This is in strong contrast to AWS APIs, which are highly interdependent.

As linked by @aaroniscode, the upstream spec https://gateway-api.sigs.k8s.io/references/spec/#gateway.networking.k8s.io%2fv1beta1.GatewayClass states that the gateway implementation must put a finalizer on the gateway, to govern deletion of the gateway class. It's not clear to me how this is supposed to work in practice. e.g., what happens if the user uninstalls the gateway implementation? Who is responsible for removing the finalizer?

Maybe we can reach out to sig-networking to get an authoritative answer from the spec designers.

edit: just realized @aaroniscode already did this here: https://github.com/aws/aws-application-networking-k8s/issues/105#issuecomment-1455128641

Perhaps we should bundle the gateway class as part of the helm install, and never require a user to create one at runtime.

ellistarn commented 1 year ago

Discussed w/ @liwenwu-amazon. As a path forward, we will include the gateway class as part of the installation. Users will not need to create this manually. @aaroniscode, thoughts?

aaroniscode commented 1 year ago

Sounds good to me @ellistarn . I think other projects have taken this approach as well. The question remains what to do if the GatewayClass is deleted. I mentioned here that the community is likely not going forward with a finalizer and no other projects appear to be implementing one.

If that's the case, to avoid a hung system with the GatewayClass deleted, I see two paths:

What do you think?

Hu1buerger commented 5 months ago

Is a workaround known to remove the GatewayClass?

zijun726911 commented 5 months ago

Did you meet any issue or get stuck when delete the GatewayClass? GatewayClass don't have any finalizer. it should be deleted without a hitch

Hu1buerger commented 5 months ago

In my case it gets stuck when deleting the GatewayClass. even after running "microk8s reset"

$kubectl get gatewayclasses.gateway.networking.k8s.io
NAME   CONTROLLER                                      ACCEPTED   AGE
eg     gateway.envoyproxy.io/gatewayclass-controller   True       91m

it still exists

Hu1buerger commented 5 months ago

it survies a reset and reboot.

with

$sudo microk8s kubectl describe gatewayclasses
Name:         eg
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  gateway.networking.k8s.io/v1
Kind:         GatewayClass
Metadata:
  Creation Timestamp:             2024-04-15T23:38:42Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2024-04-15T23:52:09Z
  Finalizers:
    gateway-exists-finalizer.gateway.networking.k8s.io
  Generation:        2
  Resource Version:  221422
  UID:               75afcac3-ce42-4f86-bbd8-39df6dcb8b8d
Spec:
  Controller Name:  gateway.envoyproxy.io/gatewayclass-controller
Status:
  Conditions:
    Last Transition Time:  2024-04-15T23:38:42Z
    Message:               Valid GatewayClass
    Observed Generation:   1
    Reason:                Accepted
    Status:                True
    Type:                  Accepted
Events:                    <none>

and

$sudo microk8s kubectl delete gatewayclass eg
gatewayclass.gateway.networking.k8s.io "eg" deleted

[HANGS HERE]
zijun726911 commented 5 months ago

This project aws-application-networking-k8s (aws-gateway-api-controller) only manages GatewayGlass withcontrollerName: application-networking.k8s.aws/gateway-api-controller.

https://github.com/aws/aws-application-networking-k8s/blob/8c2ad0f102526e41ab3975799732652b69963cde/pkg/controllers/gatewayclass_controller.go#L68

But your GatewayGlass has Controller Name: gateway.envoyproxy.io/gatewayclass-controller, which is not managed by the aws-gateway-api-controller . You could search answer and ask in the envoy gateway repo : https://github.com/envoyproxy/gateway https://gateway.envoyproxy.io/

But from my limited knowledge in k8s, you probably could try to do kubectl edit gatewayclass eg and delete these lines and save it:

Finalizers:
    gateway-exists-finalizer.gateway.networking.k8s.io

And then try kubectl delete gatewayclass eg again to hard delete this gatewayclass. (I am not sure what the consequence of hard delete you should search answer in the https://github.com/envoyproxy/gateway )