gluster / gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes
Apache License 2.0
875 stars 390 forks source link

Deployment on existing AKS cluster nodes #555

Closed maurya-m closed 5 years ago

maurya-m commented 5 years ago

Have an existing AKS cluster with 5 nodes, i am using only 3 nodes in my topology with the internal ip mapped as mentioned in the setup guide.

After running the gk-deploy was able to get to this point where its trying to finding the heketi service but facing this error and terminated the deployment: Determining heketi service URL ... OK /c/Users/.azure-kubectl/kubectl -n maurya-dev exec -i deploy-heketi-559446b649-qlnwr -- heketi-cli -s http://localhost:8080 --user admin --secret '' topology load --json=/etc/heketi/topology.json 2>&1 /c/Users/.azure-kubectl/kubectl -n maurya-dev exec -i deploy-heketi-559446b649-qlnwr -- heketi-cli -s http://localhost:8080 --user admin --secret '' topology load --json=/etc/heketi/topology.json 2>&1 Error: Unable to open config file command terminated with exit code 255 Error loading the cluster topology.

Any ideas what might be causing this issue? Also how can check the sshd on my AKS nodes are able to communicate with each other?

Thanks in Advance.

maurya-m commented 5 years ago

@jarrpa - know you have being active in replying to most queries, wanted to check with you ; sorry if intruding.

Have followed all the pre-requistes but still failing to get the heketi cli to create the nodes (did the wipefs -af on all my devices in the topology)

topology.json: { "clusters": [ { "nodes": [ { "node": { "hostnames": { "manage": [ "aks-nodepool1-70391060-0" ], "storage": [ "10.240.0.7" ] }, "zone": 1 }, "devices": [ "/dev/sdd" ] }, { "node": { "hostnames": { "manage": [ "aks-nodepool1-70391060-1" ], "storage": [ " 10.240.0.8" ] }, "zone": 1 }, "devices": [ "/dev/sdd" ] }, { "node": { "hostnames": { "manage": [ "aks-nodepool1-70391060-2" ], "storage": [ "10.240.0.6" ] }, "zone": 1 }, "devices": [ "/dev/sdd" ] } ] } ] }

maurya-m commented 5 years ago

image

maurya-m commented 5 years ago

@nixpanic , @jarrpa - Have tried to deploy on a fresh cluster (3 nodes) - it failed again on the topology load command for heketi-cli line# 833, so went ahead with --abort flag - but as mentioned did not delete the gluster resources already created by the heketi on my raw disk.

When i do a lsblk -l command --> see this below under my raw disk /dev/sdc , how do i delete this - not sure what is vg mean here (newbie here on storage )

image

any ideas on resolving this..thanks in advance.

Collin-Moore commented 5 years ago

You may want to read through issue #385 to get rid of gluster resource the --abort flag does not. The last comment from abrahamrhoffman is fairly helpful:

Run from anywhere with working admin kubectl:

cd gluster-kubernetes/deploy && ./gk-deploy -gy --abort

This removes everything: GlusterFS pods, labels, Heketi deployments, etc

Next, send this job to all your nodes (I use ansible):

vgremove -ff $(sudo vgdisplay | grep -i "VG Name" | awk '{print $3}')
rm -rf /var/lib/heketi /etc/glusterfs /var/lib/glusterd /var/log/glusterfs 
pvremove /dev/loop0 
wipefs -a -f /dev/loop0

Be sure to change out your loop dev number to whatever you use.

Then:

./gk-deploy -g

And it works!

maurya-m commented 5 years ago

@Collin-Moore , thanks for replying; have tried the suggestion listed on #385, but vgdisplay does not list any VG groupname in my case so am not able to remove them.

image

maurya-m commented 5 years ago

@Collin-Moore , was able to get rid of the vg_ created in the earlier deployment. - Solved it by reboot the nodes.

@nixpanic - As we are using AKS from Azure to create our K8s cluster ( 3 nodes) not having access to master node. can we run the deployment script from one of the nodes?

maurya-m commented 5 years ago

After completion of the gluster-heketi deployments , created StorageClass & PVC using the glusterfs provisioner, the PVC frozen on pending state, on inspect of events getting timeout / server misbehaving, where can i look for logs related to the heketi for resolution for this error? @nixpanic @jarrpa - can you please share some hints here. thanks.

image

maurya-m commented 5 years ago

i am able to curl the endpoints getting the GET request response:

image

maurya-m commented 5 years ago

Closing this issue as the problem lies in the dns of aks cluster where it is unable to resolve unless external lb is configured to hit the heketi service. Will update once hearing back from azure team on this.