Unable to connect to cluster, some kind of API server error, earlier version of k9s do not run into this issue

nawarnoori commented 2 years ago

Describe the bug Unable to connect to my production cluster using the latest version of k9s (0.25.21). We're using EKS, version says:

$ k --context prod version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.15-eks-a64ea69", GitCommit:"03450cdabfc4162d4e447e6d8c5037efe6d29742", GitTreeState:"clean", BuildDate:"2022-05-12T18:44:04Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

To Reproduce Steps to reproduce the behavior:

I run k9s --context prod to connect to our prod cluster
k9s hangs for some time, I see the 'dial k8s toast' message in the top right corner
k9s will then exit abruptly

Expected behavior I should be able to connect to my prod cluster and see all its pods

Versions (please complete the following information):

OS: Ubuntu 21.10
K9s: [0.25.21]
K8s: [1.20.15-eks-a64ea69]

Additional context I downgraded to 0.25.18 which is the latest version that does not exhibit this issue.

Running k9s with logs (k9s -l debug --context prod) I get:

9:38AM INF 🐶 K9s starting up...
9:38AM DBG Active Context "prod"
9:38AM WRN Unable to dial discovery API error="exec plugin: invalid apiVersion \"client.authentication.k8s.io/v1alpha1\""
9:38AM ERR Fail to locate metrics-server error="exec plugin: invalid apiVersion \"client.authentication.k8s.io/v1alpha1\""
9:38AM ERR failed to connect to cluster error="exec plugin: invalid apiVersion \"client.authentication.k8s.io/v1alpha1\""
9:38AM WRN No context specific skin file found -- /home/nawarnoori/.config/k9s/prod_skin.yml
9:38AM WRN No skin file found -- /home/nawarnoori/.config/k9s/skin.yml. Loading stock skins.
9:38AM DBG Factory START with ns `""
9:38AM ERR Load cluster resources - No API server connection
9:38AM DBG CustomView watching `/home/nawarnoori/.config/k9s/views.yml
9:38AM WRN Custom view load failed /home/nawarnoori/.config/k9s/views.yml error="open /home/nawarnoori/.config/k9s/views.yml: no such file or directory"
9:38AM WRN CustomView watcher failed error="no such file or directory"
9:38AM ERR Unable to connect to api server error="exec plugin: invalid apiVersion \"client.authentication.k8s.io/v1alpha1\""
9:38AM ERR Load cluster resources - No API server connection
9:38AM DBG Fetching latest k9s rev...
9:38AM WRN Unable to dial discovery API error="No connection to cached dial"
9:38AM DBG K9s latest rev: "v0.25.21"
9:38AM DBG SWITCH CTX "contexts"--"prod"
9:38AM DBG Switching context "prod"
9:38AM DBG TABLE-UPDATER canceled -- "contexts"
9:38AM ERR Unable to connect to api server error="exec plugin: invalid apiVersion \"client.authentication.k8s.io/v1alpha1\""
9:38AM ERR Context switch failed error="Unable to connect to context \"prod\""
9:38AM ERR Unable to connect to context "prod"
9:39AM ERR Unable to connect to api server error="exec plugin: invalid apiVersion \"client.authentication.k8s.io/v1alpha1\""
9:39AM ERR ClusterUpdater failed error="Conn check failed (1/5)"
9:39AM ERR Unable to connect to api server error="exec plugin: invalid apiVersion \"client.authentication.k8s.io/v1alpha1\""
9:39AM ERR ClusterUpdater failed error="Conn check failed (2/5)"
9:39AM ERR Unable to connect to api server error="exec plugin: invalid apiVersion \"client.authentication.k8s.io/v1alpha1\""
9:39AM ERR ClusterUpdater failed error="Conn check failed (3/5)"
9:39AM ERR Unable to connect to api server error="exec plugin: invalid apiVersion \"client.authentication.k8s.io/v1alpha1\""
9:39AM ERR ClusterUpdater failed error="Conn check failed (4/5)"
9:39AM ERR Unable to connect to api server error="exec plugin: invalid apiVersion \"client.authentication.k8s.io/v1alpha1\""
9:39AM ERR Conn check failed (5/5). Bailing out!

In contrast, 0.25.18's logs (the latest version that works for me) look something like:

$ tail -f /tmp/k9s-nawarnoori.log
...
9:39AM INF 🐶 K9s starting up...
9:39AM DBG Active Context "prod"
9:39AM INF ✅ Kubernetes connectivity
9:39AM WRN No context specific skin file found -- /home/nawarnoori/.config/k9s/prod_skin.yml
9:39AM WRN No skin file found -- /home/nawarnoori/.config/k9s/skin.yml. Loading stock skins.
9:39AM DBG Factory START with ns `"all"
9:39AM WRN Cluster metrics failed error="`list access denied for user on \"\":v1/nodes"
9:39AM DBG Fetching latest k9s rev...
9:39AM DBG K9s latest rev: "v0.25.21"
9:39AM WRN Fail CRDs load error="`list access denied for user on \"\":apiextensions.k8s.io/v1/customresourcedefinitions"
9:39AM DBG CustomView watching `/home/nawarnoori/.config/k9s/views.yml
9:39AM WRN Custom view load failed /home/nawarnoori/.config/k9s/views.yml error="open /home/nawarnoori/.config/k9s/views.yml: no such file or directory"
9:39AM WRN CustomView watcher failed error="no such file or directory"
9:39AM WRN Cluster metrics failed error="user is not authorized to list nodes"
9:39AM WRN Fail CRDs load error="[list watch] access denied on resource \"-\":\"apiextensions.k8s.io/v1/customresourcedefinitions\""

qglover commented 2 years ago

Having the same issue. How do you downgrade to an older version?

slimus commented 2 years ago

@qglover @nawarnoori hello! Have you tried recommendations from #1619? Thanks!

techdragon commented 2 years ago

I've got the same issue. Following things in #1619 it looks like its the library upgrade that caused the issues. Rolling k9s back to 0.25.18 works for me.

Edit: Are we likely to see an update that fixes EKS support? Seems like a bad idea to expect every EKS user to either downgrade or work around this issue.

nawarnoori commented 2 years ago

Having the same issue. How do you downgrade to an older version?

Grab it from here for your OS.

nawarnoori commented 2 years ago

@qglover @nawarnoori hello! Have you tried recommendations from #1619? Thanks!

Okay, that seems to have done it, thank you for the suggestion.

But I am not entirely sure of the implications of upgrading the AWS CLI and also updating my k8s config so I echo what techdragon says about supporting users for whom this worked before.

Do feel free to close if the official advice is to upgrade aws for newer versions of k9s, though a diagnostic would be helpful here. Perhaps even reversion it since going from 0.25.18 to 0.25.21 suggests a patch, rather than a breaking change.

derailed / k9s

Unable to connect to cluster, some kind of API server error, earlier version of k9s do not run into this issue #1650