gluster / gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes
Apache License 2.0
875 stars 390 forks source link

Unable to create node: New Node doen't have glusterd running #512

Closed dimthe closed 6 years ago

dimthe commented 6 years ago

Hello Im trying to set this up using the script on 3 nodes , not VMs . This is the log from the script when it fails and at the end is my topology. What is wrong?

Do you wish to proceed with deployment?

Using Kubernetes CLI.

Checking status of namespace matching 'default': default Active 113d Using namespace "default". Checking for pre-existing resources... GlusterFS pods ... Checking status of pods matching '--selector=glusterfs=pod':

Timed out waiting for pods matching '--selector=glusterfs=pod'. not found. deploy-heketi pod ... Checking status of pods matching '--selector=deploy-heketi=pod':

Timed out waiting for pods matching '--selector=deploy-heketi=pod'. not found. heketi pod ... Checking status of pods matching '--selector=heketi=pod':

Timed out waiting for pods matching '--selector=heketi=pod'. not found. gluster-s3 pod ... Checking status of pods matching '--selector=glusterfs=s3-pod':

Timed out waiting for pods matching '--selector=glusterfs=s3-pod'. not found. Creating initial resources ... /usr/local/bin/kubectl -n default create -f /home/admin/dimtheo/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml 2>&1 serviceaccount "heketi-service-account" created /usr/local/bin/kubectl -n default create clusterrolebinding heketi-sa-view --clusterrole=edit --serviceaccount=default:heketi-service-account 2>&1 clusterrolebinding "heketi-sa-view" created /usr/local/bin/kubectl -n default label --overwrite clusterrolebinding heketi-sa-view glusterfs=heketi-sa-view heketi=sa-view clusterrolebinding "heketi-sa-view" labeled OK /usr/local/bin/kubectl -n default create secret generic heketi-config-secret --from-file=private_key=/dev/null --from-file=./heketi.json --from-file=topology.json=topology.json secret "heketi-config-secret" created /usr/local/bin/kubectl -n default label --overwrite secret heketi-config-secret glusterfs=heketi-config-secret heketi=config-secret secret "heketi-config-secret" labeled sed -e 's/\${HEKETI_EXECUTOR}/kubernetes/' -e 's#\${HEKETI_FSTAB}#/var/lib/heketi/fstab#' -e 's/\${HEKETI_ADMIN_KEY}//' -e 's/\${HEKETI_USER_KEY}//' /home/admin/dimtheo/gluster-kubernetes/deploy/kube-templates/deploy-heketi-deployment.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1 service "deploy-heketi" created deployment "deploy-heketi" created Waiting for deploy-heketi pod to start ... Checking status of pods matching '--selector=deploy-heketi=pod': deploy-heketi-7c4898d9cd-hk5vv 1/1 Running 0 11s OK Determining heketi service URL ... OK /usr/local/bin/kubectl -n default exec -i deploy-heketi-7c4898d9cd-hk5vv -- heketi-cli -s http://localhost:8080 --user admin --secret '' topology load --json=/etc/heketi/topology.json 2>&1 Creating cluster ... ID: 4dbd5f803c83fa376614fb6f38923cbb Allowing file volumes on cluster. Allowing block volumes on cluster. Creating node node1 ... Unable to create node: New Node doesn't have glusterd running Creating node node3 ... Unable to create node: New Node doesn't have glusterd running Creating node node4 ... Unable to create node: New Node doesn't have glusterd running Error loading the cluster topology. Please check the failed node or device and rerun this script. [admin@node1 deploy]$

cat topology.json { "clusters": [ { "nodes": [ { "node": { "hostnames": { "manage": [ "node1" ], "storage": [ "10.100.1.69" ] }, "zone": 1 }, "devices": [ "/dev/loop0" ] }, { "node": { "hostnames": { "manage": [ "node3" ], "storage": [ "10.100.1.71" ] }, "zone": 1 }, "devices": [ "/dev/loop0" ] }, { "node": { "hostnames": { "manage": [ "node4" ], "storage": [ "10.100.1.72" ] }, "zone": 1 }, "devices": [ "/dev/loop0" ] } ] } ] }

[admin@node1 deploy]$

phlogistonjohn commented 6 years ago

While it's not clear to me what exactly failed in your case there's a couple things I think you should check for. First, it's hard for me to tell if you are deploying to 3 nodes that are dedicated gluster nodes or if you just mean 3 hardware nodes running k8s. If you are using dedicated gluster nodes you need to verify that heketi can ssh into the nodes and that the glusterd service is started. In that deployment approach gk-deploy and heketi expect that you've already set up the gluster componenets.

If you are running 3 hardware nodes with kubernetes it appears that you may either need to pass the -g option to gk-deploy in order to have the script deploy gluster pods to your nodes. If you are already specifying that switch you'll need to do some deeper debugging as to why the pods didn't deploy (could be a labeling issue perhaps).

I hope that enough to get you started.

dimthe commented 6 years ago

using the -g switch i get this now

Do you wish to proceed with deployment?

Using Kubernetes CLI. Using namespace "default". Checking for pre-existing resources... GlusterFS pods ... not found. deploy-heketi pod ... not found. heketi pod ... not found. gluster-s3 pod ... not found. Creating initial resources ... serviceaccount "heketi-service-account" created clusterrolebinding "heketi-sa-view" created clusterrolebinding "heketi-sa-view" labeled OK node "node2" labeled node "node3" labeled node "node4" labeled daemonset "glusterfs" created Waiting for GlusterFS pods to start ...

pods not found.

dimthe commented 6 years ago

these are kubernetes nodes , trying to install glusterfs

dimthe commented 6 years ago

ok i managed to complete all above steps. i have no idea what was wrong but a reboot solved these issues glusterfs pods are now up and running deploy-heketi pod keeps crashing with this error

ERROR: Unable to open config file /etc/heketi/heketi.json: open /etc/heketi/heketi.json: no such file or directory
Heketi v7.0.0-5-gc10cbd1-release-7
[admin@node1 ~]$

by default it was using heketi:dev image , i have tried several tag versions of this with no better results

dimthe commented 6 years ago

@jarrpa @phlogistonjohn

i dont think the above error has something to do with the image version but somthing else what can i check to see why it does not copy the heketi.json file ?

kubectl describe pod deploy-heketi-ddc749c87-wmzwq Name: deploy-heketi-ddc749c87-wmzwq Namespace: default Node: node2/10.100.1.70 Start Time: Tue, 28 Aug 2018 21:51:39 +0000 Labels: deploy-heketi=pod glusterfs=heketi-pod pod-template-hash=887305743 Annotations: Status: Running IP: 10.233.75.18 Controlled By: ReplicaSet/deploy-heketi-ddc749c87 Containers: deploy-heketi: Container ID: docker://96d0af07509fc945ee1eedcfeacd534c9065dc078f720f630f2c22ecabf71937 Image: heketi/heketi:2 Image ID: docker-pullable://heketi/heketi@sha256:fdb0ad11b3998b6f2a769831cf15eebaeaca0c60c6d8c89f7359a401a072161c Port: 8080/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Tue, 28 Aug 2018 23:09:54 +0000 Finished: Tue, 28 Aug 2018 23:09:54 +0000 Ready: False Restart Count: 20 Liveness: http-get http://:8080/hello delay=30s timeout=3s period=10s #success=1 #failure=3 Readiness: http-get http://:8080/hello delay=3s timeout=3s period=10s #success=1 #failure=3 Environment: HEKETI_USER_KEY: HEKETI_ADMIN_KEY: HEKETI_EXECUTOR: kubernetes HEKETI_FSTAB: /var/lib/heketi/fstab HEKETI_SNAPSHOT_LIMIT: 14 HEKETI_KUBE_GLUSTER_DAEMONSET: y HEKETI_IGNORE_STALE_OPERATIONS: true Mounts: /etc/heketi from config (rw) /var/lib/heketi from db (rw) /var/run/secrets/kubernetes.io/serviceaccount from heketi-service-account-token-nrp8x (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: db: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: config: Type: Secret (a volume populated by a Secret) SecretName: heketi-config-secret Optional: false heketi-service-account-token-nrp8x: Type: Secret (a volume populated by a Secret) SecretName: heketi-service-account-token-nrp8x Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: Events: Type Reason Age From Message


Warning BackOff 2m (x372 over 1h) kubelet, node2 Back-off restarting failed container [admin@node1 deploy]$

[admin@node1 deploy]$ kubectl logs deploy-heketi-ddc749c87-wmzwq ERROR: Unable to open config file /etc/heketi/heketi.json: open /etc/heketi/heketi.json: no such file or directory Heketi v2.0.6-2-g704c08a-release-2 [admin@node1 deploy]$

phlogistonjohn commented 6 years ago

@dimthe you appear to have closed the issue but your previous comment indicates that you had not solved the problem. Were you able to figure out what had gone wrong?