gluster / gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes
Apache License 2.0
875 stars 389 forks source link

gk-deploy stops for ever at setup-openshift-heketi-storage - #513

Closed dimthe closed 6 years ago

dimthe commented 6 years ago

devices have been created like below on each node

dd if=/dev/zero of=disk.img bs=1024k seek=25600 count=0 [admin@node2 ~]$ sudo losetup -f disk.img [admin@node2 ~]$ sudo losetup -a /dev/loop0: [64769]:176168645 (/home/admin/disk.img) [admin@node2 ~]$ sudo fdisk /dev/loop0 Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them. Be careful before using the write command.

Device does not contain a recognized partition table Building a new DOS disklabel with disk identifier 0xe34df948.

Command (m for help): p

Disk /dev/loop0: 26.8 GB, 26843545600 bytes, 52428800 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0xe34df948

  Device Boot      Start         End      Blocks   Id  System

Command (m for help): q

all firewall ports and modules are open and loaded per the script instructions

this is my topology

[admin@node1 deploy]$ cat topology.json { "clusters": [ { "nodes": [ { "node": { "hostnames": { "manage": [ "node2" ], "storage": [ "10.100.1.70" ] }, "zone": 1 }, "devices": [ "/dev/loop0" ] }, { "node": { "hostnames": { "manage": [ "node3" ], "storage": [ "10.100.1.71" ] }, "zone": 1 }, "devices": [ "/dev/loop0" ] }, { "node": { "hostnames": { "manage": [ "node4" ], "storage": [ "10.100.1.72" ] }, "zone": 1 }, "devices": [ "/dev/loop0" ] } ] } ] }

[admin@node1 deploy]$

and the script logs

Do you wish to proceed with deployment?

Using Kubernetes CLI.

Checking status of namespace matching 'default': default Active 114d Using namespace "default". Checking for pre-existing resources... GlusterFS pods ... Checking status of pods matching '--selector=glusterfs=pod':

Timed out waiting for pods matching '--selector=glusterfs=pod'. not found. deploy-heketi pod ... Checking status of pods matching '--selector=deploy-heketi=pod':

Timed out waiting for pods matching '--selector=deploy-heketi=pod'. not found. heketi pod ... Checking status of pods matching '--selector=heketi=pod':

Timed out waiting for pods matching '--selector=heketi=pod'. not found. gluster-s3 pod ... Checking status of pods matching '--selector=glusterfs=s3-pod':

Timed out waiting for pods matching '--selector=glusterfs=s3-pod'. not found. Creating initial resources ... /usr/local/bin/kubectl -n default create -f /home/admin/dimtheo/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml 2>&1 serviceaccount "heketi-service-account" created /usr/local/bin/kubectl -n default create clusterrolebinding heketi-sa-view --clusterrole=edit --serviceaccount=default:heketi-service-account 2>&1 clusterrolebinding "heketi-sa-view" created /usr/local/bin/kubectl -n default label --overwrite clusterrolebinding heketi-sa-view glusterfs=heketi-sa-view heketi=sa-view clusterrolebinding "heketi-sa-view" labeled OK Marking 'node2' as a GlusterFS node. /usr/local/bin/kubectl -n default label nodes node2 storagenode=glusterfs --overwrite 2>&1 node "node2" labeled Marking 'node3' as a GlusterFS node. /usr/local/bin/kubectl -n default label nodes node3 storagenode=glusterfs --overwrite 2>&1 node "node3" labeled Marking 'node4' as a GlusterFS node. /usr/local/bin/kubectl -n default label nodes node4 storagenode=glusterfs --overwrite 2>&1 node "node4" labeled Deploying GlusterFS pods. sed -e 's/storagenode\: glusterfs/storagenode\: 'glusterfs'/g' /home/admin/dimtheo/gluster-kubernetes/deploy/kube-templates/glusterfs-daemonset.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1 daemonset "glusterfs" created Waiting for GlusterFS pods to start ... Checking status of pods matching '--selector=glusterfs=pod': glusterfs-4x284 1/1 Running 0 1m glusterfs-6xc7m 1/1 Running 0 1m glusterfs-v2w8j 1/1 Running 0 1m OK /usr/local/bin/kubectl -n default create secret generic heketi-config-secret --from-file=private_key=/dev/null --from-file=./heketi.json --from-file=topology.json=topology.json secret "heketi-config-secret" created /usr/local/bin/kubectl -n default label --overwrite secret heketi-config-secret glusterfs=heketi-config-secret heketi=config-secret secret "heketi-config-secret" labeled sed -e 's/\${HEKETI_EXECUTOR}/kubernetes/' -e 's#\${HEKETI_FSTAB}#/var/lib/heketi/fstab#' -e 's/\${HEKETI_ADMIN_KEY}//' -e 's/\${HEKETI_USER_KEY}//' /home/admin/dimtheo/gluster-kubernetes/deploy/kube-templates/deploy-heketi-deployment.yaml | /usr/local/bin/kubectl -n default create -f - 2>&1 service "deploy-heketi" created deployment "deploy-heketi" created Waiting for deploy-heketi pod to start ... Checking status of pods matching '--selector=deploy-heketi=pod': deploy-heketi-7c4898d9cd-phj54 1/1 Running 0 9s OK Determining heketi service URL ... OK /usr/local/bin/kubectl -n default exec -i deploy-heketi-7c4898d9cd-phj54 -- heketi-cli -s http://localhost:8080 --user admin --secret '' topology load --json=/etc/heketi/topology.json 2>&1 Creating cluster ... ID: f93c33d35bf281b4ea81dc197eadfe66 Allowing file volumes on cluster. Allowing block volumes on cluster. Creating node node2 ... ID: d50ec8a88611271bb7879c88a485115d Adding device /dev/loop0 ... OK Creating node node3 ... ID: 3a4d9a652b959b70e52d458b86e2a4ce Adding device /dev/loop0 ... OK Creating node node4 ... ID: 58c054899a155eb137a020a64a99f68c Adding device /dev/loop0 ... OK heketi topology loaded. /usr/local/bin/kubectl -n default exec -i deploy-heketi-7c4898d9cd-phj54 -- heketi-cli -s http://localhost:8080 --user admin --secret '' setup-openshift-heketi-storage --listfile=/tmp/heketi-storage.json 2>&1

what is wrong with it ? i have 3 storage nodes i dont think i need more

phlogistonjohn commented 6 years ago

You are correct that you shouldn't need more than 3 nodes. Can you look within the logs generated by the deploy-heketi pod for errors or warnings?

Running 'heketi-cli topology info' from within the heketi pod may also provide some useful information.

dimthe commented 6 years ago

hello @phlogistonjohn

after more tries and reboots the script went a little further but crashed eventually

/usr/local/bin/kubectl -n default exec -i deploy-heketi-859478d448-gl5sc -- heketi-cli -s http://localhost:8080 --user admin --secret 'auvious' setup-openshift-heketi-storage --listfile=/tmp/heketi-storage.json 2>&1 Saving /tmp/heketi-storage.json /usr/local/bin/kubectl -n default exec -i deploy-heketi-859478d448-gl5sc -- cat /tmp/heketi-storage.json | /usr/local/bin/kubectl -n default create -f - 2>&1 secret "heketi-storage-secret" created endpoints "heketi-storage-endpoints" created service "heketi-storage-endpoints" created job "heketi-storage-copy-job" created

Checking status of pods matching '--selector=job-name=heketi-storage-copy-job':

Timed out waiting for pods matching '--selector=job-name=heketi-storage-copy-job'. Error waiting for job 'heketi-storage-copy-job' to complete.

where are the logs for this ? i see nothing usefull in /var/log or in /tmp

phlogistonjohn commented 6 years ago

For kuberntes primatives like jobs and pods you need to use the kubectl command. Run kubectl logs <podname> to get logs from pods. Run kubectl describe <object> to get details at the k8s level about the object (like it's current status, or previous status changes).

dimthe commented 6 years ago

after more tries and reboots etc without adding anything new to my set up , the script now stops at this

job "heketi-storage-copy-job" created

Checking status of pods matching '--selector=job-name=heketi-storage-copy-job':

Timed out waiting for pods matching '--selector=job-name=heketi-storage-copy-job'. Error waiting for job 'heketi-storage-copy-job' to complete. [admin@node1 deploy]$

logs from heketi pod are pasted here

https://pastebin.com/R8WEii4s