gluster / gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes
Apache License 2.0
875 stars 389 forks source link

Pods are not created #461

Closed vovkats closed 6 years ago

vovkats commented 6 years ago

When I run command ./gk-deploy -g I get result:

Using Kubernetes CLI.
Using namespace "default".
Checking for pre-existing resources...
  GlusterFS pods ... not found.
  deploy-heketi pod ... not found.
  heketi pod ... not found.
  gluster-s3 pod ... not found.
Creating initial resources ... serviceaccount "heketi-service-account" created
clusterrolebinding "heketi-sa-view" created
clusterrolebinding "heketi-sa-view" labeled
OK
node "node-1" labeled
node "node-2" labeled
daemonset "glusterfs" created
Waiting for GlusterFS pods to start ...

I have next info in my dashboard

Node didn't have enough resource: cpu, requested: 100, used: 885, capacity: 900

But I entered on node and this node has enough resources.

jarrpa commented 6 years ago

This doesn't seem like a gluster-kubernetes problem off-hand... is there a way you can view what's using resources on the given node? If not, at least see which pods are running on that node and how many resources those are consuming?

vovkats commented 6 years ago

@jarrpa Using this commandkubectl describe nodes node-3 I get net info:

Name:               node-3
Roles:              node
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/hostname=node-3
                    node-role.kubernetes.io/node=true
                    storagenode=glusterfs
Annotations:        flannel.alpha.coreos.com/backend-data={"VtepMAC":"ca:23:ef:fa:d7:d5"}
                    flannel.alpha.coreos.com/backend-type=vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager=true
                    flannel.alpha.coreos.com/public-ip=192.168.1.10
                    node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:             <none>
CreationTimestamp:  Wed, 11 Apr 2018 17:27:24 +0500
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Wed, 11 Apr 2018 18:04:51 +0500   Wed, 11 Apr 2018 17:27:21 +0500   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Wed, 11 Apr 2018 18:04:51 +0500   Wed, 11 Apr 2018 17:27:21 +0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 11 Apr 2018 18:04:51 +0500   Wed, 11 Apr 2018 17:27:21 +0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready            True    Wed, 11 Apr 2018 18:04:51 +0500   Wed, 11 Apr 2018 17:29:04 +0500   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.1.10
  Hostname:    node-3
Capacity:
 cpu:     1
 memory:  3881776Ki
 pods:    110
Allocatable:
 cpu:     900m
 memory:  3529376Ki
 pods:    110
System Info:
 Machine ID:                 609bbd29e32a4898e604f49bff82a88c
 System UUID:                D96D099F-0FEC-40CF-986D-9A5FB06AB29A
 Boot ID:                    cd2c8e64-2217-4470-b5ae-b0a9d6641b67
 Kernel Version:             3.10.0-693.11.6.el7.x86_64
 OS Image:                   CentOS Linux 7 (Core)
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.3.2
 Kubelet Version:            v1.9.5
 Kube-Proxy Version:         v1.9.5
PodCIDR:                     10.233.65.0/24
ExternalID:                  node-3
Non-terminated Pods:         (8 in total)
  Namespace                  Name                                         CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                         ------------  ----------  ---------------  -------------
  default                    netchecker-agent-hostnet-s8p95               15m (1%)      30m (3%)    64M (1%)         100M (2%)
  default                    netchecker-agent-ph72v                       15m (1%)      30m (3%)    64M (1%)         100M (2%)
  kube-system                elasticsearch-logging-v1-776b8b856c-rnd4n    100m (11%)    1 (111%)    0 (0%)           0 (0%)
  kube-system                fluentd-es-v1.22-5vqb2                       100m (11%)    0 (0%)      200Mi (5%)       200Mi (5%)
  kube-system                kube-dns-79d99cdcd5-6lxbm                    260m (28%)    0 (0%)      110Mi (3%)       170Mi (4%)
  kube-system                kube-flannel-wk88b                           150m (16%)    300m (33%)  64M (1%)         500M (13%)
  kube-system                kube-proxy-node-3                            150m (16%)    500m (55%)  64M (1%)         2G (55%)
  kube-system                nginx-proxy-node-3                           25m (2%)      300m (33%)  32M (0%)         512M (14%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits    Memory Requests  Memory Limits
  ------------  ----------    ---------------  -------------
  815m (90%)    2160m (240%)  613058560 (16%)  3599973120 (99%)
Events:
  Type    Reason                   Age                From             Message
  ----    ------                   ----               ----             -------
  Normal  Starting                 37m                kubelet, node-3  Starting kubelet.
  Normal  NodeAllocatableEnforced  37m                kubelet, node-3  Updated Node Allocatable limit across pods
  Normal  NodeHasSufficientDisk    37m (x8 over 37m)  kubelet, node-3  Node node-3 status is now: NodeHasSufficientDisk
  Normal  NodeHasSufficientMemory  37m (x8 over 37m)  kubelet, node-3  Node node-3 status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    37m (x7 over 37m)  kubelet, node-3  Node node-3 status is now: NodeHasNoDiskPressure
jarrpa commented 6 years ago

Looks pretty self-explanatory, then: You have so much CPU request on the node that there is not enough for the CPU request for the GlusterFS pod. Your options are:

  1. Get more CPU
  2. Remove pods from those nodes
  3. Edit the glusterfs-daemonset.yml manifest to specify a different (or none) CPU request.
vovkats commented 6 years ago

@jarrpa I want to clarify. Is it important to have minimum 3 nodes for installing gluster?

jarrpa commented 6 years ago

Yes. For testing/hacking purposes, you can run gk-deploy with the --single-node argument to remove this restriction.

vovkats commented 6 years ago

@jarrpa Can you explain item "Edit the glusterfs-daemonset.yml manifest to specify a different (or none) CPU request." ?

jarrpa commented 6 years ago

This line.

vovkats commented 6 years ago

@jarrpa thanks

vovkats commented 6 years ago

@jarrpa After changing configuration I have got error:

Creating node node-3 ... Unable to create node: Unable to execute command on glusterfs-8kfm2: peer probe: failed: Probe returned with Transport endpoint is not connected
Error loading the cluster topology.

But node is available.

SaravanaStorageNetwork commented 6 years ago

@vovkats

Could you check all ports are opened properly. Refer this guide: https://github.com/gluster/gluster-kubernetes/blob/master/docs/setup-guide.md

Also, check this comment may help: https://github.com/gluster/gluster-kubernetes/issues/250#issuecomment-296355028

vovkats commented 6 years ago

@SaravanaStorageNetwork ports 1-50000 are opened. Also I have run sudo iptables -I INPUT -p all -j ACCEPT but It does not help me.

SaravanaStorageNetwork commented 6 years ago

@vovkats

Check kubectl get nodes - verify all is fine here kubectl get pods - check all pods especially gluster pods are running fine.

if any issue, check kubectl describe pod \<podname>

Check whether connectivity between the nodes works fine.

Additionally, you can abort the entire setup using gk-deploy --abort and try re-running again.

vovkats commented 6 years ago

@SaravanaStorageNetwork @jarrpa I've updated heketi and after running command: ./gk-deploy -gvy I get error:

Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-7c4898d9cd-dwhhs   0/1       Error     6         5m

or

Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-7c4898d9cd-dwhhs   0/1       CrashLoopBackOff   6         10m

Command kubectl logs deploy-heketi-7c4898d9cd-dwhhs returns this result standard_init_linux.go:178: exec user process caused "exec format error

Also when I try to re-run command: ./gk-deploy -gvy I get error: Can't open /dev/vdc exclusively. Mounted filesystem?

jarrpa commented 6 years ago

You can't run gk-deploy more than once for any given deployment. You have to do gk-deploy --abort and wipe all the storage devices before running again. You also want to check the kubectl describe output for the pod and see if it shows anything as well.

mjschmidt commented 6 years ago

@vovkats do you happen to be working in a closed environment?

jarrpa commented 6 years ago

Given the silence of the OP, closhing this issue. We can reopen this if the OP returns.

fontv2 commented 5 years ago

Not sure if relevant, but I manage to have this working ONLY when I force heketi-deployment to run in the master node and use the kube-system namespace. Hope that helps.