GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
658 stars 266 forks source link

namespace delete stuck at after make undeploy #161

Closed pzou1974 closed 4 years ago

pzou1974 commented 4 years ago

I manually perform "kubectl delete ns/[my_name_space] it is hanging.

kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -n [my-namespace] error: unable to retrieve the complete list of server APIs: webhook.certmanager.k8s.io/v1beta1: the server is currently unable to handle the request

functicons commented 4 years ago

I encountered the problem once and only once as well. Not sure if it is Kubernetes problem or our problem.

functicons commented 4 years ago

As a workaround, I think we can update the make undeploy leaving the namespace undeleted as it doesn't consume compute/storage resources.

pzou1974 commented 4 years ago

Yes. I did make undeploy. but still stuck. namepsace in on terminating status. the problem is our kubernetes framework added namespace status check cross the k8s cluster and will block the new deployment :( I suggest if we can document. if failed to delete ns. what are the possible resource associate with it.

Mrart commented 4 years ago

I encount the problems as well. because some thing undeleted like certmanager.

chethanuk commented 4 years ago

namepsace in on terminating status.

Since namespace is stuck in terminating status, Do the following:

  1. Run kubectl proxy

  2. Open new terminal:

NS -> namespace which is stuck.

NS=flink-operator-system
kubectl get namespace $NS -o json >tmp.json
sed 's/"kubernetes"//' tmp.json >tmp1.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp1.json http://127.0.0.1:8001/api/v1/namespaces/$NS/finalize

Now, if you check: kubectl get namespaces namespace should be deleted within 5 seconds,

Mrart commented 4 years ago

@ChethanUK I know use your method can resovlf this problems。 but I think we still need find the deep reason, and fixed it!

functicons commented 4 years ago

I encount the problems as well. because some thing undeleted like certmanager.

We have removed the dependency on certmanager.

pzou1974 commented 4 years ago

when you will release the none certmanager fix?

functicons commented 4 years ago

@pzou1974 It is done in https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/pull/167 and released to gcr.io/flink-operator/flink-operator:latest and gcr.io/flink-operator/flink-operator:v1beta1-2.