carvel-dev / kapp

kapp is a simple deployment tool focused on the concept of "Kubernetes application" — a set of resources with the same label
https://carvel.dev/kapp
Apache License 2.0
921 stars 110 forks source link

kapp deploy with "--logs-all" doesn't stop after deployment #220

Open StarpTech opened 3 years ago

StarpTech commented 3 years ago

What happened: Sometimes when running kapp deploy --allow-ns staging --allow-ns production -a ${profile} -f- -c -y --logs-all in Github action the command will never stop and I see consecutive logs from the pods. When I remove --logs-all it works normally.

What did you expect: Printing logs should stop after the successful deployment.

Environment:


Vote on this request

This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.

👍 "I would like to see this addressed as soon as possible" 👎 "There are other more important things to focus on right now"

We are also happy to receive and review Pull Requests if you want to help working on this issue.

cppforlife commented 3 years ago

@StarpTech any correlation to anything going on? e.g. gets stuck when deployment has failed, or anything like that? how frequently does this happen?

cppforlife commented 3 years ago

actually im not seeing how this could be possible since tailing of logs happens in a go routine so it doesnt block exit of deploy command. what is the last line of output you are seeing? are you sure that kapp deploy is blocking and not something else? are you seeing https://github.com/vmware-tanzu/carvel-kapp/blob/develop/cmd/kapp/kapp.go#L48 printed?

StarpTech commented 3 years ago

any correlation to anything going on? e.g. gets stuck when deployment has failed, or anything like that? how frequently does this happen?

No, the deployment was successful but the tailing process has never stopped.

are you seeing https://github.com/vmware-tanzu/carvel-kapp/blob/develop/cmd/kapp/kapp.go#L48 printed?

Yes, after 7min and 300MB of logs in the Github action. According to the logs, the probes endpoints were called multiple times per second.

 readinessProbe:
    httpGet:
      path: /health
      port: 5002
    initialDelaySeconds: 20
    periodSeconds: 30
    timeoutSeconds: 3
    successThreshold: 1
    failureThreshold: 5
  livenessProbe:
    httpGet:
      path: /health
      port: 5002
    initialDelaySeconds: 20
    periodSeconds: 30
    timeoutSeconds: 3
    successThreshold: 1
    failureThreshold: 5
cppforlife commented 3 years ago

No, the deployment was successful but the tailing process has never stopped.

so you are saying this is happening consistently? that's ultimately my question, does it happen all the time or occasionally?

According to the logs, the probes endpoints were called multiple times per second.

thats all kubernetes. kapp has no idea what probes even are. it just puts a resource into k8s and waits for it to say im all done.

hmm it's a bit hard without output to understand what is happening. for example when is the last update for deployment status: there should be a line that indicates <timestamp> | ok: deployment .... and how far that line is from the end.

StarpTech commented 3 years ago

so you are saying this is happening consistently? that's ultimately my question, does it happen all the time or occasionally?

occasionally

thats all kubernetes. kapp has no idea what probes even are. it just puts a resource into k8s and waits for it to say im all done.

Yeah, I'm also not sure if it is really caused by kapp. It started after we switched to Kapp. I'll keep an eye of it.

cppforlife commented 3 years ago

here is a snippet i have from my own deploy with --logs-all. what im curious about is how certain markers show up to you:

...
logs | kapp-controller-5d488cc9b-pr5lm > kapp-controller | {"level":"info","ts":1621605746.3387737,"logger":"controller-runtime.manager.controller.kapp-controller-package-repository","msg":"Starting EventSource","source":"kind source: /, Kind="}
10:02:26AM: ok: reconcile apiservice/v1alpha1.package.carvel.dev (apiregistration.k8s.io/v1) cluster
10:02:26AM: ---- applying complete [13/13 done] ----
10:02:26AM: ---- waiting complete [13/13 done] ----
logs | kapp-controller-5d488cc9b-pr5lm > kapp-controller | {"level":"info","ts":1621605746.4378045,"logger":"controller-runtime.manager.controller.kapp-controller-app","msg":"Starting EventSource","source":"kind source: /, Kind="}

Succeeded

as you can see there is waiting complete message and there is one line of log that is printed because stopping tailing of logs is racing with deploy command ending. can you tell me what you are seeing in your output. where is waiting complete relative to Succeeded relative to logs. (keep in mind that --logs-all continues to stream output for all Pods in the kapp deploy until command ends so if you have some other non-deployment resource that takes long time to wait for, logs for deployment will keep on coming in).

cppforlife commented 3 years ago

@StarpTech bump

StarpTech commented 3 years ago

Sorry, nothing new to add for now. I will enable it again and see if I can report some details.