Closed namliz closed 8 years ago
This is a great start. Could you please do another thorough edit, and then I'll give my comments over the phone Monday or Tuesday on where you're leaving out the story.
In particular, on your reread, please look at the following things, which are quite distracting:
(newly documented)
and not make it a link. What is the extraneous flag
you're referring to?Template: I expected x
, I got y
, based on url
, I did z
to fix the problem.
Possible other template sections:
Follow up to #63.
Beginning of Problems
At some point in time a known-good deployment stopped succeeding on newly created clusters. This was caused by several disparate issues across several versions/configurations/components.
kubectl get svc
) and point at pods with correct endpoints (kubectl describe service
)The first step to check if a service is working correctly is actually a simple DNS check (
nslookup service
). By chance, this would often appear to be functioning as expected indicating the problem must be elsewhere (not necessarily with kubernetes).However, not to bury the lead: running nslookup on a loop would later expose that it was timing out sporadically. That is the sort of thing that makes a bug sinister as it misdirects debugging efforts away from the problem.
Known KubeDNS Issues Encountered
Secrets volume & SELinux permissions
SELinux context was missing 'svirt_sandbox_file_t' on the secretes volume and therefore from the perspective of the KubeDNS pod
/var/run/secrets/kubernetes.io/serviceaccount/
was mangled and it couldn't in turn use that to connect to the master.Secrets volume got stale
The kube-controller is responsible for injecting the secrets volume into pods and keeping it up to date. There were/are known bugs where it would fail to do that. As a result KubeDNS would mysteriously stop working because its tokens to connect to the master had grown stale. (This sort of thing: kubernetes/kubernetes#24928)
Typo
official skydns-rc.yaml had a typo at some point with
--domain=
missing the trailing dot.Scalability
It is now recommended to scale KubeDNS pods proportionally to number of nodes in a cluster.
These problems would crop up and get resolved yet errors would stubbornly persist.
Known Kubernetes Networking Issues Encountered
Initial Checks
- Networking in Kubernetes
In other words, to make sure networking is not seriously broken/misconfigured check:
At first blush these were looking fine, but pod creation was sluggish (30-60 seconds), and that is a red flag.
Missing Dependencies
As described in #62, at some version CNI folder started missing binaries.
More undocumented dependencies (#64) were found from staring at logs and noting weirdness. The real important ones are (conntrack-tools, socat, bridge-utils), these things are now being pinned down upstream.
The errors were time consuming to understand because often their phrasing would leave something to be desired. Unfortunately there's at least one known false-positive warning (kubernetes/kubernetes#23385).
Cluster CIDR overlaps
--cluster-cidr="": CIDR Range for Pods in cluster.
--service-cluster-ip-range="": CIDR Range for Services in cluster.
In my case services got a /16 starting on 10.0.0.0, the cluster-cidr got a 16 on 10.244.0.0. The service cidr is routable because kube-proxy is constantly writing iptable rules on every minion.
For Weave in particular
--ipalloc-range
needs to be passed to exactly match what's given to the Kubernetescluster-cidr
.Whatever your network overlay, it must not clobber the service range!
Iptables masquerade conflicts
Flannel
If using Flannel be sure to follow the newly documented instructions:
DOCKER_OPTS="--iptables=false --ip-masq=false"
Kube-proxy makes extensive use of masquerading rules, similar to an overlay clobbering the service range, another component (like the docker daemon itself) mucking about with masq rules will cause unexpected behavior.
Weave
Weave was originally erronously started with
--docker-endpoint=unix:///var/run/weave/weave.sock
which similarly caused unexpected behavior. This flag is extraneous and has to be omitted when used with CNI.Final Configuration
Image
Centos7 source_ami: ami-bec022de
Dependencies
SELinux disabled.
Yum installed:
kubernetes_version: 1.4.0-alpha.3 (b44b716965db2d54c8c7dfcdbcb1d54792ab8559)
weave_version: 1.6.1
1 Master (172.20.0.78)
Gist of journalctl output shows it boots fine, docker, etcd, kube-apiserver, scheduler, and controller all start. Minion registers successfully.
1 minion (172.20.0.18)
Pod created (not sluggishly). Multiple pods can ping each other.
Weave
Weave and weaveproxy are up and running just fine.
Conclusion
Kubernetes is rapidly evolving with many open issues -- there are now efforts upstream to pin down and document the dependencies along with making errors and warnings more user-friendly in the logs.
As future versions become less opaque knowing which open issue is relevant to your setup will become easier. Along with whether an obvious dependency is missing and what a good setup looks like.
The nominal sanity check command that currently exists (
kubectl get componentstatuses
) does not go far enough. It might show everything is healthy. Pods might be successfully created. Services might work.And yet these can all be misleading as a cluster may still not be entirely healthy.
A useful test I found in the official repo simply tests connectivity (and authentication) to the master. Sluggishness is not tested and sluggishness it turns out is a red flag.
In fact, there's an entire folder of these, but they are not well documented as far as I can tell.
I believe a smoke test that can deployed against any running cluster and run through a suite of checks and benchmarks (to take into account unexpectedly poor performance) would significantly improve the debugging experience.