carvel-dev / kapp

kapp is a simple deployment tool focused on the concept of "Kubernetes application" — a set of resources with the same label
https://carvel.dev/kapp
Apache License 2.0
921 stars 110 forks source link

Skip checking resources when `--wait=false` is specified #577

Open firgavin opened 2 years ago

firgavin commented 2 years ago

What steps did you take: I currently use Kapp as a CI tool to manage lots of YAML files. I used --wait=false when I deleted the app because sometimes deleting custom resources will take a long time.

What happened: kapp exits with non-zero code which makes CI fail.

$ kapp delete -a app1 --wait=false -y
Target cluster 'https://127.0.0.1:6443' (nodes: firgavin)

Changes

Namespace  Name        Kind        Age  Op      Op st.  Wait to  Rs  Ri  
default    simple-app  Deployment  22s  delete  -       -        ok  -  
^          simple-app  Service     22s  delete  -       -        ok  -  

Op:      0 create, 2 delete, 0 update, 0 noop, 0 exists
Wait to: 0 reconcile, 0 delete, 2 noop

11:18:59AM: ---- applying 2 changes [0/2 done] ----
11:18:59AM: delete deployment/simple-app (apps/v1) namespace: default
11:18:59AM: delete service/simple-app (v1) namespace: default
11:18:59AM: ---- waiting on 2 changes [0/2 done] ----
11:18:59AM: ok: noop service/simple-app (v1) namespace: default
11:18:59AM: ok: noop deployment/simple-app (apps/v1) namespace: default
11:18:59AM: ---- applying complete [2/2 done] ----
11:18:59AM: ---- waiting complete [2/2 done] ----

kapp: Error: Expected all resources to be gone, but found: endpointslice/simple-app-vp2dw (discovery.k8s.io/v1) namespace: default, pod/simple-app-64c66864f5-g9sb8 (v1) namespace: default, replicaset/simple-app-64c66864f5 (apps/v1) namespace: default

What did you expect: Kapp could skip checking resources when --wait=false is specified.

Anything else you would like to add: I did some research and I found that kapp checks the existence of related resources after applying changes. But resources will be deleted eventually. See https://github.com/vmware-tanzu/carvel-kapp/blob/v0.52.0/pkg/kapp/cmd/app/delete.go#L159. It would be great if kapp could default to skipping checking resources when --wait=false is specified or add a flag to control this logic. And if that makes sense, I'd like to help implement this ;)

Environment:


Vote on this request

This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.

👍 "I would like to see this addressed as soon as possible" 👎 "There are other more important things to focus on right now"

We are also happy to receive and review Pull Requests if you want to help working on this issue.

praveenrewar commented 2 years ago

Yeah, it seems like setting the wait flag to false would currently lead to an error while deleting recorded apps. So definitely it's a bug.

It would be great if kapp could default to skipping checking resources when --wait=false is specified or add a flag to control this logic.

It does makes sense to allow that behaviour, I am just trying to think of any side effects it could have. One obvious thing that could happen is that one or more resources are not deleted but the app itself (metadata configmap) is deleted. @cppforlife Any thoughts?

And if that makes sense, I'd like to help implement this ;)

That would be great, we will definitely review it on priority once we finalize the approach :)

renuy commented 2 years ago

Hey @firgavin good to see your here. Looking forward to your PR for this issue.

100mik commented 2 years ago

One obvious thing that could happen is that one or more resources are not deleted but the app itself

This would be a "known risk" I guess?

We might also lose out on some "retryable cases", where kapp would retry in case of a failed delete due to a retryable error.

cppforlife commented 2 years ago

I did some research and I found that kapp checks the existence of related resources after applying changes. But resources will be deleted eventually.

i think additional flag would be reasonable to disable this check. may be under dangerous?

100mik commented 2 years ago

i think additional flag would be reasonable to disable this check. may be under dangerous?

This approach makes sense to me

firgavin commented 2 years ago

Hi @cppforlife, @100mik, @praveenrewar - Thanks for your insights! Here's my proposal:

We can add a flag --dangerous-disable-checking-app-deletion to enable or disable the check:

Before I work on it, I'd like to discuss the interaction between the two flags. When --dangerous-disable-checking-app-deletion=false, should we make sure that the value of --wait is overwritten to True? If not, users can still hit the same issue. Of course, we can explain the usage in the docs if we think they should be "orthogonal". Any suggestions?

praveenrewar commented 2 years ago

When --dangerous-disable-checking-app-deletion=false, should we make sure that the value of --wait is overwritten to True?

I think that we should keep the working of these 2 flags independent of each other because a user should be able to use --dangerous-disable-checking-app-deletion irrespective of --wait being enabled or disabled.

If not, users can still hit the same issue. Of course, we can explain the usage in the docs if we think they should be "orthogonal". Any suggestions?

Maybe we can add a hint in the error message?