Open is-it-ayush opened 8 months ago
We are running the AWS CNI outside of EKS. We also have the AWS credential provider installed, this allows the kubelet to use the instance credentials to pull from private ECR registries. Before Kubernetes 1.28 (I think, might be off by a version), this functionality was bundled as part of the kubelet.
That's intresting @kwohlfahrt! I've never used aws-credential-provider
. After reading into it, I have a few questions,
kubectl apply -f
listed here on github.com/kubernetes/cloud-provider-aws/tree/master/examples/existing-cluster/base.aws-credential-provider
?username
and password
@ /etc/containerd/config.toml
but it didn't work. I was able to manually pul the image with sudo ctr images pull 602401143452.dkr.ecr.ap-south-1.amazonaws.com/amazon-k8s-cni-init:v1.16.4 -u AWS:$TOKEN
where TOKEN=$(aws ecr get-login-password --region ap-south-1)
but it didn't really seem to fix the above problem.Should I just deploy it by applying all the files with kubectl apply -f listed here on github.com/kubernetes/cloud-provider-aws/tree/master/examples/existing-cluster/base.
AFAIK, the credential provider can't be installed by applying manifests, it must be installed to your node, since you must change the kubelet flags to use it. The binary and configuration must be placed on disk, and then the kubelet's flags have to be modified to point to the configuration, and the path to search for the binary. This is documented on this page, which also includes an example config.
Where do I get the binary aws-credential-provider?
Pre-built binaries can be found here (source)
Does it work with containerd?
Yes, we've used it with containerd in the past, though we are using cri-o now. AFAIK, the container runtime never interacts with the credential provider directly - the credential provider is called by the kubelet, which then passes the received credentials on to your container runtime. So it shouldn't matter whether you are using containerd, crio, etc.
Thank you so much @kwohlfahrt! I was able to follow through and resolve this and all the pods are successfully running now. These are the steps I took,
/etc/kubernetes/manifests/kube-controller-manager.yaml
& /etc/kubernetes/manifests/kube-apiserver.yaml
with --cloud-provider=external
.
systemctl daemon-reload
&& systemctl restart kubelet.service
ecr-credential-provider
via curl -o ecr-credential-provider https://storage.googleapis.com/k8s-artifacts-prod/binaries/cloud-provider-aws/v1.29.0/linux/amd64/ecr-credential-provider-linux-amd64
.
mv ecr-credential-provider /usr/bin/ecr-credential-provider
chmod +x /usr/bin/ecr-credential-provider
credential-config.yaml
with the following
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- name: ecr-credential-provider
matchImages:
- "*.dkr.ecr.*.amazonaws.com"
defaultCacheDuration: "12h"
apiVersion: credentialprovider.kubelet.k8s.io/v1
env:
/etc/systemd/system/kubelet.service.d/aws.conf
with the following.
[Service]
Environment="KUBELET_EXTRA_ARGS=--node-ip=<x.x.x.x> --node-labels=node.kubernetes.io/node= --cloud-provider=external --image-credential-provider-config=/home/admin/.aws/ecr-credential-config.yaml --image-credential-provider-bin-dir=/usr/bin"
systemctl daemon-reload
&& systemctl restart kubelet.service
kubectl -f aws-vpc-cni.yaml
This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one.
Hey @kwohlfahrt! It seems this wasn't resolved entirely. As soon as I joined another node I ran into troubles with aws-node
pod failing to communicate with ipam
from aws-vpc-cni
but the logs from ipam
didn't indicate any errors so I was unable to understand what's wrong. The setup hasn't changed & I only added one worker (1 master [10.0.32.163], 1 worker [10.0.32.104]) Here's a few outputs from my master node,
kubectl get nodes -A
admin@ip-10-0-32-163:~$ kubectl get nodes -A
NAME STATUS ROLES AGE VERSION
ip-10-0-32-104.ap-south-1.compute.internal NotReady <none> 15h v1.29.2
ip-10-0-32-163.ap-south-1.compute.internal Ready control-plane 16h v1.29.2
kubectl get pods -A
admin@ip-10-0-32-163:~$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system aws-cloud-controller-manager-khnq6 1/1 Running 1 (72m ago) 16h
kube-system aws-node-56hf4 1/2 CrashLoopBackOff 7 (4m55s ago) 19m
kube-system aws-node-ghvzc 2/2 Running 2 (72m ago) 16h
kube-system coredns-76f75df574-rg724 0/1 CrashLoopBackOff 34 (63s ago) 16h
kube-system coredns-76f75df574-svglz 0/1 CrashLoopBackOff 7 (4m43s ago) 22m
kube-system etcd-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 1 (72m ago) 16h
kube-system kube-apiserver-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 2 (72m ago) 16h
kube-system kube-controller-manager-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 2 (72m ago) 16h
kube-system kube-proxy-kj778 1/1 Running 1 (72m ago) 15h
kube-system kube-proxy-xgzzf 1/1 Running 1 (72m ago) 16h
kube-system kube-scheduler-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 1 (72m ago) 16h
kubectl describe pods aws-node-56hf4 -n kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning MissingIAMPermissions 7m42s (x2 over 7m42s) aws-node Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
Warning MissingIAMPermissions 6m8s (x2 over 6m9s) aws-node Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
Warning MissingIAMPermissions 4m38s (x2 over 4m39s) aws-node Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
Warning MissingIAMPermissions 3m8s (x2 over 3m9s) aws-node Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
Warning MissingIAMPermissions 98s (x2 over 99s) aws-node Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
Warning MissingIAMPermissions 8s (x2 over 9s) aws-node Unauthorized operation: failed to call ec2:CreateTags due to missing permissions. Please refer https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/iam-policy.md to attach relevant policy to IAM role
Normal Scheduled 7m46s default-scheduler Successfully assigned kube-system/aws-node-56hf4 to ip-10-0-32-104.ap-south-1.compute.internal
Normal Pulled 7m45s kubelet Container image "602401143452.dkr.ecr.ap-south-1.amazonaws.com/amazon-k8s-cni-init:v1.16.4" already present on machine
Normal Created 7m45s kubelet Created container aws-vpc-cni-init
Normal Started 7m45s kubelet Started container aws-vpc-cni-init
Normal Pulled 7m44s kubelet Container image "602401143452.dkr.ecr.ap-south-1.amazonaws.com/amazon-k8s-cni:v1.16.4" already present on machine
Normal Started 7m44s kubelet Started container aws-eks-nodeagent
Normal Created 7m44s kubelet Created container aws-eks-nodeagent
Normal Pulled 7m44s kubelet Container image "602401143452.dkr.ecr.ap-south-1.amazonaws.com/amazon/aws-network-policy-agent:v1.0.8" already present on machine
Normal Started 7m44s kubelet Started container aws-node
Normal Created 7m44s kubelet Created container aws-node
Warning Unhealthy 7m38s kubelet Readiness probe failed: {"level":"info","ts":"2024-03-14T05:02:54.811Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning Unhealthy 7m33s kubelet Readiness probe failed: {"level":"info","ts":"2024-03-14T05:02:59.865Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning Unhealthy 7m28s kubelet Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:04.915Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning Unhealthy 7m20s kubelet Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:12.342Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning Unhealthy 7m10s kubelet Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:22.350Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning Unhealthy 7m kubelet Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:32.350Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning Unhealthy 6m50s kubelet Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:42.342Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning Unhealthy 6m40s kubelet Readiness probe failed: {"level":"info","ts":"2024-03-14T05:03:52.347Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning Unhealthy 6m30s kubelet Readiness probe failed: {"level":"info","ts":"2024-03-14T05:04:02.344Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"}
Normal Killing 6m10s kubelet Container aws-node failed liveness probe, will be restarted
Warning Unhealthy 2m40s (x43 over 6m30s) kubelet (combined from similar events): Readiness probe failed: {"level":"info","ts":"2024-03-14T05:07:52.354Z","caller":"/usr/local/go/src/runtime/proc.go:267","msg":"timeout: failed to connect service \":50051\" within 5s"
kubectl logs coredns-76f7df574-rg724
admin@ip-10-0-32-163:~$ kubectl logs coredns-76f75df574-rg724 -n kube-system
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration SHA512 = 591cf328cccc12bc490481273e738df59329c62c0b729d94e8b61db9961c2fa5f046dd37f1cf888b953814040d180f52594972691cd6ff41be96639138a43908
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:46941->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:48624->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:35195->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:36595->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:37395->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:53769->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:39372->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:49266->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[870704998]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:10:50.372) (total time: 30001ms):
Trace[870704998]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:20.374)
Trace[870704998]: [30.001959325s] [30.001959325s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1121138999]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:10:50.372) (total time: 30001ms):
Trace[1121138999]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:20.374)
Trace[1121138999]: [30.001824712s] [30.001824712s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[757947080]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:10:50.373) (total time: 30001ms):
Trace[757947080]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:20.374)
Trace[757947080]: [30.001669002s] [30.001669002s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:59870->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1113266275012896724.8518814352627412410. HINFO: read udp 10.0.32.235:36793->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[308293075]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:21.583) (total time: 30001ms):
Trace[308293075]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:51.584)
Trace[308293075]: [30.00153721s] [30.00153721s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1924537645]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:21.772) (total time: 30001ms):
Trace[1924537645]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:51.773)
Trace[1924537645]: [30.001441343s] [30.001441343s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1601989491]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:21.892) (total time: 30000ms):
Trace[1601989491]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:11:51.893)
Trace[1601989491]: [30.000541411s] [30.000541411s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1839797281]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:53.729) (total time: 30002ms):
Trace[1839797281]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30002ms (05:12:23.731)
Trace[1839797281]: [30.002135986s] [30.002135986s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[2131737096]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:54.116) (total time: 30001ms):
Trace[2131737096]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:24.117)
Trace[2131737096]: [30.001094761s] [30.001094761s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[342939726]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:54.708) (total time: 30001ms):
Trace[342939726]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:24.709)
Trace[342939726]: [30.001121228s] [30.001121228s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/kubernetes: Trace[731275138]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:28.220) (total time: 11342ms):
Trace[731275138]: [11.342820089s] [11.342820089s] END
[INFO] plugin/kubernetes: Trace[1946198945]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:28.081) (total time: 11481ms):
Trace[1946198945]: [11.481121164s] [11.481121164s] END
[INFO] plugin/kubernetes: Trace[1707910341]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:27.480) (total time: 12082ms):
Trace[1707910341]: [12.082670995s] [12.082670995s] END
kubectl logs coredns-76f75df574-svglz -n kube-system
admin@ip-10-0-32-163:~$ kubectl logs coredns-76f75df574-svglz -n kube-system
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration SHA512 = 591cf328cccc12bc490481273e738df59329c62c0b729d94e8b61db9961c2fa5f046dd37f1cf888b953814040d180f52594972691cd6ff41be96639138a43908
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:39153->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:34390->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:34202->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:44007->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:40443->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:47108->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:59620->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:39071->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[244891391]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:24.389) (total time: 30001ms):
Trace[244891391]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:54.390)
Trace[244891391]: [30.001548794s] [30.001548794s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[106582316]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:24.389) (total time: 30002ms):
Trace[106582316]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:54.391)
Trace[106582316]: [30.00208516s] [30.00208516s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1365423089]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:24.389) (total time: 30001ms):
Trace[1365423089]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:11:54.390)
Trace[1365423089]: [30.001969555s] [30.001969555s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:57291->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 1600033383188009841.8067679233946884018. HINFO: read udp 10.0.32.13:52147->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1202752718]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:55.195) (total time: 30000ms):
Trace[1202752718]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:25.196)
Trace[1202752718]: [30.000482356s] [30.000482356s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[528314086]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:55.738) (total time: 30004ms):
Trace[528314086]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30004ms (05:12:25.742)
Trace[528314086]: [30.00474037s] [30.00474037s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[401932378]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:11:55.919) (total time: 30001ms):
Trace[401932378]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (05:12:25.921)
Trace[401932378]: [30.001416591s] [30.001416591s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1029911745]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:27.513) (total time: 30000ms):
Trace[1029911745]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:57.514)
Trace[1029911745]: [30.000923168s] [30.000923168s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1647125159]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:27.996) (total time: 30003ms):
Trace[1647125159]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:57.997)
Trace[1647125159]: [30.003270334s] [30.003270334s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1397932663]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (14-Mar-2024 05:12:28.082) (total time: 30000ms):
Trace[1397932663]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (05:12:58.083)
Trace[1397932663]: [30.000758193s] [30.000758193s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s
/var/log/aws-routed-eni/iampd.log
:
ipamd-log.tar.gz/var/log/aws-routed-eni/plugin.log
: (worker-node)
{"level":"error","ts":"2024-03-14T04:10:43.568Z","caller":"routed-eni-cni-plugin/cni.go:283","msg":"Error received from DelNetwork gRPC call for container 75d411ca04ea3ea9d079947801458b9938aaf07cbefc8803364c316d28588972: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:50051: connect: connection refused\""}
I did assign ec2:CreateTags
permission which seemed missing & I recreated my entire cluster. The rediness and liveness probes still throw same x.x.x.x:xxx -> 10.x.0.x:53
errors and coredns
s unable to get ready
.
Hm, I'm not sure. My only suspicion is you might be hitting #2840 I reported the other day.
You can easily check by connecting to your node and seeing if /run/xtables.lock
is a directory - it should be a file. If it is created as a directory, it causes kube-proxy
to fail, which means the CNI cannot reach the API server.
You can see the linked PR in that issue for the fix (the volume needs to be defined with type: FileOrCreate
), just make sure to SSH to the node and rmdir /run/xtables.lock
after applying the fix.
Thank You @kwohlfahrt! I had some missing IAM
permissions which I added to master node. It seems though it still hasn't really resolved the problem where "coredns" isn't not being reached apparent from the logs when running kubectl logs coredns-76f75df574-49gs5 -n kube-system
. I'm not entirely sure what's causing this.
[ERROR] plugin/errors: 2 4999722014791650549.7690820414208347954. HINFO: read udp 10.0.43.148:57589->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 4999722014791650549.7690820414208347954. HINFO: read udp 10.0.43.148:38940->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
Update! I was really unable to resolve coredns
issues with aws-vpc-cni
& aws-cloud-controller-manager
. There are multiple issues,
controller-manager
fails to get providerId
from aws cloud for nodes in random order even if you set the hostname to private IPV4 DNS name and add the correct tags. Failing to initialise newly joined nodes or even the master node itself as this leads to the worker nodes getting deleted and master node tainted as NotReady
.coredns
pod fails to run regardless of the first issue and there is no way to debug why. The logs collected by /opt/cni/bin/aws-cni-support.sh
are not enough to debug the coredns problem.I switched to cilium and let go of my dream to connect k8s and aws.
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
This seems like the coredns pod go the ip-ddress, but it wasn't able to communicate with the API server, due to missing permissions? The nodes/pods should have the ability to communicate with API server with the necessary permissions.
Were you able to narrow down to any permission issue?
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
This seems like the coredns pod go the ip-ddress, but it wasn't able to communicate with the API server, due to missing permissions? The nodes/pods should have the ability to communicate with API server with the necessary permissions.
Were you able to narrow down to any permission issue?
Not really! I really did all I could and scanned all of journalctl to find something. I wrote about it here & I couldn't get aws-vpc-cni
working as far as I remember. I double checked permissions and instance roles but it didn't seem like they were a problem.
It seems like both of them are broken. The controller-manager fails to get providerId from aws cloud for nodes in random order even if you set the hostname to private IPV4 DNS name and add the correct tags. Failing to initialise newly joined nodes or even the master node itself as this leads to the worker nodes getting deleted and master node tainted as NotReady. The coredns pod fails to run regardless of the first issue and there is no way to debug why. The logs collected by /opt/cni/bin/aws-cni-support.sh are not enough to debug the coredns problem.
I am hitting the same issue. the Pod cannot communicate with any endpoints including
@terryjix - This is question on setting up VPC CNI on a non EKS cluster. How did you go about with this?
Closing this due to lack of more information.
This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one.
This issue needs to be reopened - it seems to be a fairly ubiquitous issue when attempting to use the amazon-vpc-cni in a non-EKS environment.
I've also encountered it (coredns not able to communicate):
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server [WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API .:53 [INFO] plugin/reload: Running configuration SHA512 = 591cf328cccc12bc490481273e738df59329c62c0b729d94e8b61db9961c2fa5f046dd37f1cf888b953814040d180f52594972691cd6ff41be96639138a43908 CoreDNS-1.11.3 linux/amd64, go1.21.11, a6338e9 [ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:57241->10.0.0.2:53: i/o timeout [ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:42295->10.0.0.2:53: i/o timeout [ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:33996->10.0.0.2:53: i/o timeout [ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:50361->10.0.0.2:53: i/o timeout [ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:58932->10.0.0.2:53: i/o timeout [ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:35147->10.0.0.2:53: i/o timeout [ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:47365->10.0.0.2:53: i/o timeout [ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:60287->10.0.0.2:53: i/o timeout [INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout [INFO] plugin/kubernetes: Trace[2115550610]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:24:38.357) (total time: 30000ms): Trace[2115550610]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (19:25:08.358) Trace[2115550610]: [30.000916518s] [30.000916518s] END [ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch v1.Namespace: failed to list v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout [INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout [INFO] plugin/kubernetes: Trace[935094613]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:24:38.358) (total time: 30000ms): Trace[935094613]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (19:25:08.358) Trace[935094613]: [30.000403807s] [30.000403807s] END [ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch v1.Service: failed to list v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout [INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout [INFO] plugin/kubernetes: Trace[1423531700]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:24:38.358) (total time: 30000ms): Trace[1423531700]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (19:25:08.359) Trace[1423531700]: [30.000293311s] [30.000293311s] END [ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch v1.EndpointSlice: failed to list v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout [ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:44224->10.0.0.2:53: i/o timeout [ERROR] plugin/errors: 2 5717391959630560116.4828385316436471351. HINFO: read udp 10.0.0.75:60914->10.0.0.2:53: i/o timeout [INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout [INFO] plugin/kubernetes: Trace[1341126722]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:25:09.591) (total time: 30000ms): Trace[1341126722]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (19:25:39.592) Trace[1341126722]: [30.000759936s] [30.000759936s] END [ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch v1.EndpointSlice: failed to list v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout [INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout [INFO] plugin/kubernetes: Trace[1646410435]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:25:09.695) (total time: 30001ms): Trace[1646410435]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (19:25:39.696) Trace[1646410435]: [30.001364482s] [30.001364482s] END [ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch v1.Namespace: failed to list v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout [INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout [INFO] plugin/kubernetes: Trace[1072212733]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229 (16-Sep-2024 19:25:09.753) (total time: 30000ms): Trace[1072212733]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (19:25:39.754) Trace[1072212733]: [30.000533915s] [30.000533915s] END [ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch v1.Service: failed to list v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
Closing this due to lack of more information.
@orsenthil Why was this closed? It seems like there's plenty of information and repro steps?
fairly ubiquitous issue when attempting to use the amazon-vpc-cni in a non-EKS environment.
We will need to reproduce this and investigate. Re-opened.
Thanks!
I've got a cluster that reproduces and willing to screen share/support as needed.
I've fixed my issue by running vpc-cni-k8s on EKS optimized AMI. vpc-cni-k8s plugin conflicts with ec2-net-utils. ec2-net-utils adds more route rules which broke the pod to pod communication in my case. the EKS optimized ami has optimized this issue.
I've fixed my issue by running vpc-cni-k8s on EKS optimized AMI. vpc-cni-k8s plugin conflicts with ec2-net-utils. ec2-net-utils adds more route rules which broke the pod to pod communication in my case. the EKS optimized ami has optimized this issue.
Does this work for even outside EKS? I think this bug was for outside EKS (for example, I'm running self-managed on ubuntu AMIs with kubeadm)
yes, I used kubeadmin to create kubernetes cluster on Amazon Linux 2 ami and found the pod cannot communicate with outside. some strange rules created on route table which overwrites the rules vpc-cni created.
You can find optimized ubuntu ami from https://cloud-images.ubuntu.com/aws-eks/ . Maybe it can fix your issue. You can build your self-managed kubernetes control plan on these amis. The optimized ami has disabled some services may affect network configuration in the OS.
yes, I used kubeadmin to create kubernetes cluster on Amazon Linux 2 ami and found the pod cannot communicate with outside. some strange rules created on route table which overwrites the rules vpc-cni created.
You can find optimized ubuntu ami from https://cloud-images.ubuntu.com/aws-eks/ . Maybe it can fix your issue. You can build your self-managed kubernetes control plan on these amis. The optimized ami has disabled some services may affect network configuration in the OS.
It says clearly on the page: These images are customised specifically for the EKS service, and are not intended as general OS images.
What happened:
Hi! I have an
ec2
instance &containerd
as the container runtime inside a private subnet (which has outbound internet access) inap-south-1
. I have intialized a new cluster withkubeadm init
on this master node. It ran successfully. I then wanted to installamazon-vpc-cni
as the network manager for my k8s cluster. I rankubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/master/aws-k8s-cni.yaml
and checked the pods inkubectl get pods -n kube-system
. One of the pod created byamazon-vpc-cni-k8s
namedaws-node-xxxx
throws an error when trying to initialise. I didkubectl describe pod aws-node-xxx -n kube-system
and I get the following.I don't understand why this fails. Is it not possible to use
amazon-vpc-cni
outside eks in self managed cluster? I also looked around here in issues & it seems like other people had this issue before but I was unable to resolve it myself. Here is my policyk8s_master_ecr
inside ak8s_master
role which is connected to thismaster
instance via an instance profile,Environment:
kubectl version
):master
branchcat /etc/os-release
):uname -a
):Linux ip-x-x-x-x.ap-south-1.compute.internal 6.1.0-13-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux