k9s (0.6.x) very very slow on Mac

eldada commented 5 years ago

Describe the bug The 0.6.x versions of k9s are very slow on my mac. So slow that an arrow key or command take more than 10 seconds to respond.

To Reproduce Steps to reproduce the behavior:

Was running 0.5.2 with no issues
brew upgrade k9s
Started k9s with exact same k8s configuration
So slow... 😟

Expected behavior No performance regression

Versions (please complete the following information):

OS: Mac OSX
K9s: 0.6.0 and 0.6.1
K8s: 1.12.x

Additional context Upgraded from 0.5.2 to 0.6.0 (and later to 0.6.1) using brew upgrade

eldada commented 5 years ago

BTW - rolled back to 0.5.2 and it's back to normal.

despairblue commented 5 years ago

On Linux as well. I'm using Arch Linux.

derailed commented 5 years ago

@eldada @despairblue. Yikes not good ;( We've added a lot of synchronization of recent which could stem these issues. Can you guy give me more info on this. ie what does cpu/mem for k9s looks like. What action/resource are you trying to perform. Is this cluster specific or all clusters? Are you specifying skins or just using the defaults? Does this happen for all K8s resources or specific ones?

I've ran K9s on all my clusters but not seeing this 10s+ slow down on OSX. Starting with 0.6.0 a lot of the dependencies were updated. Please send me as much info as you can so we can narrow this down. Thank you!

eldada commented 5 years ago

I tested it on my Mac. CPU is not that high. Between 0 to 4%. No high IO or Network loads. Same behaviour with any of my clusters.

Resources wise - Just wanting to describe a pod and choose a namespace. BTW - after a few steps, k9s hangs completely. Had to kill it. Several times.

derailed commented 5 years ago

@eldada Thank you for reporting back! Is the default view pod view when K9s comes up? If so how many pods are in your current namespace? Also could you try k9s -l debug and see if there is anything in the k9s logs, perhaps connectivity issue? Lastly could you try mv ~/.k9s/config.yml ~/k9s/config_old.yml and rerun. Don't think this will be an issue, but trying to eliminate potentials since only 2 people have reported this issue so far and I can't seem to repro any of this on my clusters. Tx!

eldada commented 5 years ago

Looks like it's much better after I remove the config.yml and start fresh (although a bit slower than before). BUT, when I go into the node view (command: no), it slows back to a crawl again (and eventually hangs). Even after exiting the node view. My biggest cluster size is 34 nodes.

So the common pattern for reproducing is the nodes view.

BTW - where is the log saved (I used -l debug)?

despairblue commented 5 years ago

@derailed removing the config helped a bit. It's kind of usable, but every couple of seconds it just hangs for a couple of seconds. It's not accepting keyboard input but it buffers it, when it stops hanging it actually handles all keys. So I just need to be patient when I type something and nothing happens.

As for the other questions:

Is the default view pod view when K9s comes up?

Usually its the context view (ctx)

If so how many pods are in your current namespace?

4 contexts. The clusters I usually use have 55 to 245 pods.

Also could you try k9s -l debug and see if there is anything in the k9s logs, perhaps connectivity issue?

Did that, but where do I find them?

Thanks for looking into this. k9s has bin indispensable for me :+1:

derailed commented 5 years ago

@eldada @despairblue Thank you both for the feedback and kind words! Still not able to repro but do notice some lags while typing or scrolling, so I will look into that. If you notice anything else odd, please report back too, any clues here would help me narrow this down.

I need to make this a bit more explicit, but you can find the log location but typing:

k9s info

eldada commented 5 years ago

Additional info that might help:

My clusters are GKE
My clusters are physically far from me. Over 80ms ping time to API server
My biggest cluster has between 600 to 700 pods in up to 34 nodes

I found a few errors in the log:

7:21AM ERR Reconciliation failed error="Get https://3.3.3.3/api/v1/namespaces/matank-ui/pods: error executing access token command \"/Users/eldada/google-cloud-sdk/bin/gcloud config config-helper --format=json\": err=exit status 1 output= stderr=ERROR: (gcloud.config.config-helper) There was a problem refreshing your current auth tokens: Unable to find the server at accounts.google.com\nPlease run:\n\n  $ gcloud auth login\n\nto obtain new credentials, or if you have already logged in with a\ndifferent account:\n\n  $ gcloud config set account ACCOUNT\n\nto select an already authenticated account to use.\n"
7:21AM WRN Access Post https://3.3.3.3/apis/authorization.k8s.io/v1/selfsubjectaccessreviews: error executing access token command "/Users/eldada/google-cloud-sdk/bin/gcloud config config-helper --format=json": err=exit status 1 output= stderr=ERROR: (gcloud.config.config-helper) There was a problem refreshing your current auth tokens: Unable to find the server at accounts.google.com
Please run:

  $ gcloud auth login

to obtain new credentials, or if you have already logged in with a
different account:

  $ gcloud config set account ACCOUNT

to select an already authenticated account to use.

But my connection was ok.

Also found:

9:30AM ERR Tail logs failed `dganit/dganit-artifactory-edge2-artifactory-0:manager-log -- Get https://3.3.3.3/api/v1/namespaces/dganit/pods/dganit-artifactory-edge2-artifactory-0/log?container=manager-log&follow=true&tailLines=200: context canceled
9:30AM ERR Tail logs failed `dganit/dganit-artifactory-edge2-artifactory-0:host-manager-log -- Get https://3.3.3.3/api/v1/namespaces/dganit/pods/dganit-artifactory-edge2-artifactory-0/log?container=host-manager-log&follow=true&tailLines=200: context canceled

But they are not from lagging actions.

Also saw error about the skin file:

8:55AM ERR No skin file found. Loading defaults. error="open /Users/eldada/.k9s/skin.yml: no such file or directory"

IMHO, this should not be an error. Just a WRN.

Nothing else special in the logs.

Aracki commented 5 years ago

I have tried new version on my master k8s instance and remotely from macOS.

In both cases, very slow.

derailed commented 5 years ago

@eldada Awesome report! Thank you so much for the follow up on this. I did change the skin err to a warning last nite. Tx for pointing this out!

You Sir, kept me up last nite trying to figure out what's up ;( I think I've found a few issues perf wise...

Nodes a very expensive to display in light of metrics (which I guess are on in your clusters) since we have to get the metrics and also compute the requested resource for each pods on each nodes. I think it's a super useful metric but so expensive to compute ie nodespodscontainers. Might need to revisit volunteering this info at a later time.

In the mean time and just for shits and giggles would you mind changing the refresh beat and see if that helps?

Try this to change K9s refresh rate to 60s vs 2s (default):

k9s -r 60

I think, this would get ride of the sluggishness for ~1min. Is this true?

derailed commented 5 years ago

@Aracki Thanks for the report! Working on this issue right now. Hopefully we can get a resolve soon.

derailed commented 5 years ago

@eldada @despairblue @Aracki. Dropping 0.6.2! Hopefully I've moved the needle in the right direction... 🙏 Thank you all for helping me track some of these down!!

eldada commented 5 years ago

@derailed - a couple of updates.

Running 0.6.1 with -r 60 improved performance and did not show any lag. So indeed this is a good workaround
I tried 0.6.2 with defaults, and it works much better! There is a small performance hit, but is very usable. @Aracki , @despairblue - can you try the new version and comment?

If indeed 0.6.2 is better for all, I think we can close this.

Thanks @derailed for the prompt responses and care!

Aracki commented 5 years ago

0.6.2 works faster!

paivagustavo commented 5 years ago

Hi! 0.6.2 is still orders of magnitude slower than 0.5.2, am I missing something?

derailed commented 5 years ago

@eldada @Aracki Thank you so much for reporting back! Whoosh! am I glad to hear this! Was holding back on doing a perf pass until things stabilize a bit more but I am glad we did. So happy to hear, you guys are functional again! Not the end of this story I am afraid so many diff use cases and configurations...

derailed commented 5 years ago

@paivagustavo Could you qualify specifically which use cases yield this statement, so I can track down and address?

paivagustavo commented 5 years ago

@derailed The pods view just freezes. When I try to select another pod (pressing arrows) there is a ~10 sec delay. When trying to view the yaml, it just hangs and nothing happens.

I'm trying to load all-namespaces which contains 859 pods atm.

my current k9s.yaml:

k9s:
  refreshRate: 2
  logBufferSize: 1000
  logRequestSize: 200
  currentContext: general
  currentCluster: my-cluster
  clusters:
    my-cluster:
      namespace:
        active: all
        favorites:
        - all
        - somenamespace
        - kube-system
        - default
      view:
        active: po

derailed commented 5 years ago

@paivagustavo Thank you for reporting back and for the great details!! I think I understand the problem. With that many pods the refresh rate is dragging K9s to a crawl. I am working on a better way to handle this.

ghost commented 5 years ago

@derailed After upgrading from 0.5.2 to 0.6.2 I found the ui partially gets only refreshed after keyboard events (use key arrows for example).

Reproduce:

open pods view
delete one or more pods
wait -> list gets not refreshed
press up or down -> list gets refreshed

Or:

simply change the context -> cluster Info on the top left gets not refreshed and still shows the info about the recent cluster

Let me know if I shall move that into an other issue. OS: macOS k9s: 0.6.2 k8s: 1.12.6

derailed commented 5 years ago

@swe-covis. Thank you for reporting this! I've noticed it last nite too. I am reworking the observer which what I think will be a much better approach. Sorry about this mess! I will have a new drop soon. Thank you all for your patience and understanding!!

ghost commented 5 years ago

@derailed :-) It's pre-release so we have to expect issues. Nevertheless your tools are incredible useful

derailed commented 5 years ago

@swe-covis Thank you so much for your very kind and touching comment! This is a great encouragement as K9s is sailing in a bit of rough waters at the moment as we're going around the cap ;(

Humbled!

eldada commented 5 years ago

0.6.3 is MUCH better! Smooth and responsive. I think this issue can be closed. @despairblue , @Aracki - do you guys see the improvement?

derailed commented 5 years ago

@eldada Thank you so much for your feedback and appreciation!! Serenity now I hope... 🙏

paivagustavo commented 5 years ago

wow, @derailed it's incredibly fast now :tada: .

Thank you so much for your efforts for building and maintaining such an awesome tool!

ghost commented 5 years ago

@derailed good shot :-)

derailed commented 5 years ago

@eldada @swe-covis @paivagustavo. You're making me blush . Thank you so much for your kindness, patience and support! I am so happy to hear all this. Closing til next time...

derailed / k9s

k9s (0.6.x) very very slow on Mac #176