Closed php-cpm closed 6 years ago
grctl node list
看一下,是否有worker属性
如果是ubuntu 16.04 需要检查一下内核版本
mkdir -p /sys/fs/cgroup/rdma/kubepods
systemctl restart kubelet
没有 worker 属性
cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
uname -a
Linux manage01 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
grctl node list
+--------------------------------------+-----------+----------+----------------+----------+----------------------+
| Uid | IP | HostName | NodeRole | NodeMode | Status |
+--------------------------------------+-----------+----------+----------------+----------+----------------------+
| 76b9384f-92ac-4eea-88fe-bbd2fc91b414 | 10.9.85.3 | manage01 | manage,compute | master | running(unhealthy) |
+--------------------------------------+-----------+----------+----------------+----------+----------------------+
grctl node get 76b9384f-92ac-4eea-88fe-bbd2fc91b414
@php-cpm 看看哪个服务不正常
grctl node get 76b9384f-92ac-4eea-88fe-bbd2fc91b414
-------------------Node information-----------------------
status running
unschedulable false
alived true
uuid 76b9384f-92ac-4eea-88fe-bbd2fc91b414
host_name manage01
create_time 2018-10-25 10:27:23.770326548 +0800 CST
internal_ip 10.9.85.3
external_ip 10.9.85.3
role manage,compute
mode master
available_memory 0
available_cpu 0
pid 874
version
up 2018-10-25 10:38:16.01292159 +0800 CST
down 0001-01-01 00:00:00 +0000 UTC
connected false
-------------------service health-----------------------
+-------------------------+---------+----------------------+
| Title | Result | Message |
+-------------------------+---------+----------------------+
| Ready | false | |
| NodeInit | true | |
| rbd-db | true | |
| kube-controller-manager | true | |
| kube-scheduler | true | |
| rbd-hub | true | |
| kubelet | false | Tcp connection error |
| rbd-entrance | true | |
| kube-apiserver | true | |
| rbd-lb | true | |
| rbd-api | true | |
| rbd-webcli | true | |
| rbd-eventlog | true | |
| rbd-worker | true | |
| etcd | true | |
| rbd-chaos | true | |
| rbd-mq | true | |
| rbd-monitor | true | |
| docker | true | |
| local-dns | true | |
| rbd-app-ui | true | |
| storage | true | |
| rbd-dns | true | |
| calico | true | |
| rbd-repo | true | |
+-------------------------+---------+----------------------+
可以看下kubelet日志
journalctl -fu kubelet
多谢,对命令不熟,现在知道问题了
journalctl -fu kubelet
Oct 25 10:45:42 manage01 kubelet[15286]: F1025 10:45:42.666546 15286 server.go:190] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory
Oct 25 10:45:42 manage01 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Oct 25 10:45:42 manage01 systemd[1]: Unit kubelet.service entered failed state.
Oct 25 10:45:42 manage01 systemd[1]: kubelet.service failed.
Oct 25 10:45:52 manage01 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Oct 25 10:45:52 manage01 systemd[1]: Starting kubelet...
Oct 25 10:45:52 manage01 systemd[1]: Started kubelet.
没有文件 /var/lib/kubelet/config.yaml 我手动创建
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
evictionHard:
memory.available: "200Mi"
然后日志报错
failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
将/root/.kube/config 复制到 /etc/kubernetes/bootstrap-kubelet.conf
service 启动了
但是 grctl node get 76b9384f-92ac-4eea-88fe-bbd2fc91b414 kubelet仍然 是 false。
kubectl get csr
NAME AGE REQUESTOR CONDITION
node-csr-sBjotqjjLkOaKUM3xrlxtwYokYshL7JisiNktFinoC4 6m admin Pending
kubectl certificate approve node-csr-sBjotqjjLkOaKUM3xrlxtwYokYshL7JisiNktFinoC4
certificatesigningrequest "node-csr-sBjotqjjLkOaKUM3xrlxtwYokYshL7JisiNktFinoC4" approved
日志
Oct 25 11:54:26 manage01 kubelet[31760]: F1025 11:54:26.276082 31760 server.go:262] failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename Type Size Used Priority /swapfile file 524284 1032 -1]
swapoff -a
grctl cluster
Used/Total Use of
CPU 0/8 0%
Memory 0/15885 0%
DistributedDisk 12Gb/19Gb 63.63%
+-------------------------+-----------------------+---------------------------------------------+
| Service | HealthyQuantity/Total | Message |
+-------------------------+-----------------------+---------------------------------------------+
| ClusterStatus | unhealthy | There is a service exception in the cluster |
| kube-controller-manager | 1/1 | |
| local-dns | 1/1 | |
| Ready | 0/1 | manage01:/ |
| rbd-api | 1/1 | |
| etcd | 1/1 | |
| rbd-monitor | 1/1 | |
| rbd-webcli | 1/1 | |
| rbd-repo | 1/1 | |
| rbd-mq | 1/1 | |
| kubelet | 1/1 | |
| OutOfDisk | 0/1 | manage01:Kubelet never posted node status./ |
| rbd-entrance | 1/1 | |
| calico | 1/1 | |
| MemoryPressure | 0/1 | manage01:Kubelet never posted node status./ |
| rbd-hub | 1/1 | |
| docker | 1/1 | |
| storage | 1/1 | |
| rbd-dns | 1/1 | |
| NodeInit | 1/1 | |
| rbd-db | 1/1 | |
| rbd-eventlog | 1/1 | |
| rbd-worker | 1/1 | |
| kube-apiserver | 1/1 | |
| rbd-app-ui | 1/1 | |
| DiskPressure | 0/1 | manage01:Kubelet never posted node status./ |
| rbd-chaos | 1/1 | |
| kube-scheduler | 1/1 | |
| rbd-lb | 1/1 | |
+-------------------------+-----------------------+---------------------------------------------+
+--------------------------------------+-----------+----------+----------------+----------+----------------------+
| Uid | IP | HostName | NodeRole | NodeMode | Status |
+--------------------------------------+-----------+----------+----------------+----------+----------------------+
| 76b9384f-92ac-4eea-88fe-bbd2fc91b414 | 10.9.85.3 | manage01 | manage,compute | master | running(unhealthy) |
+--------------------------------------+-----------+----------+----------------+----------+----------------------+
kubelet不是我们定制的吧,kubelet什么版本
估计机器上之前装了其他的 k8s 版本,我换个干净环境重装一次试试
干净环境重装后仍然 kubelet 启动不了,但报错不一样了
centos 7.2
Oct 30 18:10:53 manage01 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Oct 30 18:10:53 manage01 systemd[1]: Starting kubelet...
Oct 30 18:10:53 manage01 systemd[1]: Started kubelet.
Oct 30 18:10:53 manage01 bash[3779]: Flag --maximum-dead-containers-per-container has been deprecated, Use --eviction-hard or --eviction-soft instead. Will be removed in a future version.
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.644099 3779 feature_gate.go:144] feature gates: map[]
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.654520 3779 docker.go:364] Connecting to docker on unix:///var/run/docker.sock
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.654549 3779 docker.go:384] Start docker client with request timeout=2m0s
Oct 30 18:10:53 manage01 bash[3779]: W1030 18:10:53.655453 3779 cni.go:158] Unable to update cni config: No networks found in /opt/rainbond/etc/cni/
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.665263 3779 manager.go:143] cAdvisor running in container: "/system.slice/kubelet.service"
Oct 30 18:10:53 manage01 bash[3779]: W1030 18:10:53.690133 3779 manager.go:151] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp 127.0.0.1:15441: getsockopt: connection refused
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.708847 3779 fs.go:117] Filesystem partitions: map[/dev/vda1:{mountpoint:/ major:253 minor:1 fsType:xfs blockSize:0} /dev/vdb:{mountpoint:/data major:253 minor:16 fsType:xfs blockSize:0}]
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.713714 3779 manager.go:198] Machine: {NumCores:8 CpuFrequency:2194916 MemoryCapacity:16656801792 MachineID:f180f4f45ab34cdc85a5a7b5b599b20e SystemUUID:76B9384F-92AC-4EEA-88FE-BBD2FC91B414 BootID:1e03cf22-6785-483e-9c2d-5403dedaf25c Filesystems:[{Device:/dev/vda1 Capacity:21463302144 Type:vfs Inodes:20970496 HasInodes:true} {Device:/dev/vdb Capacity:107321753600 Type:vfs Inodes:52428800 HasInodes:true}] DiskMap:map[252:1:{Name:dm-1 Major:252 Minor:1 Size:10737418240 Scheduler:none} 252:16:{Name:dm-16 Major:252 Minor:16 Size:10737418240 Scheduler:none} 252:19:{Name:dm-19 Major:252 Minor:19 Size:10737418240 Scheduler:none} 252:2:{Name:dm-2 Major:252 Minor:2 Size:10737418240 Scheduler:none} 252:4:{Name:dm-4 Major:252 Minor:4 Size:10737418240 Scheduler:none} 252:7:{Name:dm-7 Major:252 Minor:7 Size:10737418240 Scheduler:none} 252:0:{Name:dm-0 Major:252 Minor:0 Size:107374182400 Scheduler:none} 253:0:{Name:vda Major:253 Minor:0 Size:21474836480 Scheduler:mq-deadline} 253:16:{Name:vdb Major:253 Minor:16 Size:107374182400 Scheduler:mq-deadline} 252:10:{Name:dm-10 Major:252 Minor:10 Size:10737418240 Scheduler:none} 252:17:{Name:dm-17 Major:252 Minor:17 Size:10737418240 Scheduler:none} 252:5:{Name:dm-5 Major:252 Minor:5 Size:10737418240 Scheduler:none} 252:9:{Name:dm-9 Major:252 Minor:9 Size:10737418240 Scheduler:none} 252:6:{Name:dm-6 Major:252 Minor:6 Size:10737418240 Scheduler:none} 252:11:{Name:dm-11 Major:252 Minor:11 Size:10737418240 Scheduler:none} 252:12:{Name:dm-12 Major:252 Minor:12 Size:10737418240 Scheduler:none} 252:13:{Name:dm-13 Major:252 Minor:13 Size:10737418240 Scheduler:none} 252:14:{Name:dm-14 Major:252 Minor:14 Size:10737418240 Scheduler:none} 252:15:{Name:dm-15 Major:252 Minor:15 Size:10737418240 Scheduler:none} 252:18:{Name:dm-18 Major:252 Minor:18 Size:10737418240 Scheduler:none} 252:3:{Name:dm-3 Major:252 Minor:3 Size:10737418240 Scheduler:none} 252:8:{Name:dm-8 Major:252 Minor:8 Size:10737418240 Scheduler:none}] NetworkDevices:[{Name:eth0 MacAddress:52:54:00:b2:5a:8a Speed:0 Mtu:1454} {Name:tunl0 MacAddress:00:00:00:00 Speed:0 Mtu:1440}] Topology:[{Id:0 Memory:17179332608 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]}] Caches:[]} {Id:1 Memory:0 Cores:[{Id:0 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]}] Caches:[]} {Id:2 Memory:0 Cores:[{Id:0 Threads:[2] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]}] Caches:[]} {Id:3 Memory:0 Cores:[{Id:0 Threads:[3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]}] Caches:[]} {Id:4 Memory:0 Cores:[{Id:0 Threads:[4] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]}] Caches:[]} {Id:5 Memory:0 Cores:[{Id:0 Threads:[5] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]}] Caches:[]} {Id:6 Memory:0 Cores:[{Id:0 Threads:[6] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]}] Caches:[]} {Id:7 Memory:0 Cores:[{Id:0 Threads:[7] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]}] Caches:[]}] CloudProvider:Unknown InstanceType:Unknown InstanceID:None}
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.714436 3779 manager.go:204] Version: {KernelVersion:3.10.0-862.9.1.el7.x86_64 ContainerOsVersion:CentOS Linux 7 (Core) DockerVersion:1.12.6 CadvisorVersion: CadvisorRevision:}
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.715201 3779 server.go:513] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
Oct 30 18:10:53 manage01 bash[3779]: W1030 18:10:53.717129 3779 container_manager_linux.go:218] Running with swap on is not supported, please disable swap! This will be a fatal error by default starting in K8s v1.6! In the meantime, you can opt-in to making this a fatal error by enabling --experimental-fail-swap-on.
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.717266 3779 container_manager_linux.go:245] container manager verified user specified cgroup-root exists: /
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.717279 3779 container_manager_linux.go:250] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs ProtectKernelDefaults:false EnableCRI:true NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>}]} ExperimentalQOSReserved:map[]}
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.717430 3779 server.go:805] Using root directory: /var/lib/kubelet
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.717461 3779 kubelet.go:265] Watching apiserver
Oct 30 18:10:53 manage01 bash[3779]: W1030 18:10:53.720807 3779 kubelet_network.go:70] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.720834 3779 kubelet.go:494] Hairpin mode set to "hairpin-veth"
Oct 30 18:10:53 manage01 bash[3779]: W1030 18:10:53.720939 3779 cni.go:158] Unable to update cni config: No networks found in /opt/rainbond/etc/cni/
Oct 30 18:10:53 manage01 bash[3779]: I1030 18:10:53.720954 3779 plugins.go:196] Loaded network plugin "cni"
Oct 30 18:10:53 manage01 bash[3779]: panic: runtime error: invalid memory address or nil pointer dereference
Oct 30 18:10:53 manage01 bash[3779]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x565539]
Oct 30 18:10:53 manage01 bash[3779]: goroutine 1 [running]:
Oct 30 18:10:53 manage01 bash[3779]: panic(0x2e5be00, 0xc420012060)
Oct 30 18:10:53 manage01 bash[3779]: /usr/local/go/src/runtime/panic.go:500 +0x1a1
Oct 30 18:10:53 manage01 bash[3779]: k8s.io/kubernetes/pkg/kubelet/network/cni.(*cniNetworkPlugin).PluginType(0xc420aec510, 0x3418372, 0x18)
Oct 30 18:10:53 manage01 bash[3779]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/network/cni/cni.go:188 +0x9
Oct 30 18:10:53 manage01 bash[3779]: k8s.io/kubernetes/pkg/kubelet/network.InitNetworkPlugin(0xc42099c460, 0x2, 0x2, 0x7fffd4627e79, 0x3, 0x4e9c880, 0xc42020a230, 0x33ebbac, 0xc, 0x33e5c76, ...)
Oct 30 18:10:53 manage01 bash[3779]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/network/plugins.go:202 +0x868
Oct 30 18:10:53 manage01 bash[3779]: k8s.io/kubernetes/pkg/kubelet.NewMainKubelet(0xc42066cd00, 0xc420aea500, 0x3419000, 0x18, 0xc4207fb100, 0x1)
Oct 30 18:10:53 manage01 bash[3779]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kubelet.go:496 +0x1c02
Oct 30 18:10:53 manage01 bash[3779]: k8s.io/kubernetes/cmd/kubelet/app.CreateAndInitKubelet(0xc42066cd00, 0xc420aea500, 0x0, 0xc4207fb100, 0x1, 0x1, 0x3)
Oct 30 18:10:53 manage01 bash[3779]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubelet/app/server.go:898 +0x42
Oct 30 18:10:53 manage01 bash[3779]: k8s.io/kubernetes/cmd/kubelet/app.RunKubelet(0xc42066cd00, 0xc420aea500, 0x0, 0x0, 0x0)
Oct 30 18:10:53 manage01 bash[3779]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubelet/app/server.go:814 +0x451
Oct 30 18:10:53 manage01 bash[3779]: k8s.io/kubernetes/cmd/kubelet/app.run(0xc42066cd00, 0xc420aea500, 0x28, 0xc4207fbe01)
Oct 30 18:10:53 manage01 bash[3779]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubelet/app/server.go:581 +0x3b0
Oct 30 18:10:53 manage01 bash[3779]: k8s.io/kubernetes/cmd/kubelet/app.Run(0xc42066cd00, 0x0, 0x22, 0xc4202aecf0)
Oct 30 18:10:53 manage01 bash[3779]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubelet/app/server.go:290 +0x10a
Oct 30 18:10:53 manage01 bash[3779]: main.main()
Oct 30 18:10:53 manage01 bash[3779]: /go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubelet/kubelet.go:48 +0x92
Oct 30 18:10:53 manage01 systemd[1]: kubelet.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Oct 30 18:10:53 manage01 systemd[1]: Unit kubelet.service entered failed state.
Oct 30 18:10:53 manage01 systemd[1]: kubelet.service failed.
@php-cpm 日志显示没有cni配置,是按照我们文档安装的么,系统清理干净了么
清理干净后重装启动成了
grctl version
Rainbond grctl 3.7.2-349e6fb-2018-10-16-16
grctl cluster
+-------------------------+-----------------------+---------------------------------------------+ | Service | HealthyQuantity/Total | Message | +-------------------------+-----------------------+---------------------------------------------+ | ClusterStatus | unhealthy | There is a service exception in the cluster | | rbd-dns | 1/1 | | | kube-controller-manager | 1/1 | | | kube-scheduler | 1/1 | | | etcd | 1/1 | | | NodeInit | 1/1 | | | rbd-eventlog | 1/1 | | | kube-apiserver | 1/1 | | | calico | 1/1 | | | rbd-webcli | 1/1 | | | rbd-db | 1/1 | | | local-dns | 1/1 | | | rbd-app-ui | 1/1 | | | rbd-chaos | 1/1 | | | kubelet | 0/1 | manage01:Tcp connection error/ | | docker | 1/1 | | | rbd-lb | 1/1 | | | rbd-entrance | 1/1 | | | rbd-repo | 1/1 | | | rbd-api | 1/1 | | | rbd-monitor | 1/1 | | | storage | 1/1 | | | Ready | 0/1 | manage01:/ | | rbd-worker | 1/1 | | | rbd-mq | 1/1 | | | rbd-hub | 1/1 | | +-------------------------+-----------------------+---------------------------------------------+