Having issues with gk-deploy script

mjschmidt commented 6 years ago

Hi I am running kubernetes 1.8.5 on centos7.

I tried the script, but I get the glusterd not running. I seem to be having issues enabling the service, I checked some of the other issues on here, noticed that some other people were having similar problems so I went to abort my gk-deploy and the gluster pods did not delete.

Question 1 Is that an issue or can I keep trying to now re run the gk-deploy script even though my gluster pods are still up? If it is an issue how can I delete my gluster deamonset without causing and possible kubernetes caching problems?

Question 2 Why is it complaining about glusterd service not being started? Is that why I read that you needed to get rid of everything yum installed except for the gluster client?

Question 3 @jarrpa halp me. I see you answer a lot of these questions, getting exact error logs is tough because I am in an offline environment, but I will transcribe the important looking parts

jarrpa commented 6 years ago

Correct. We are discussing ways to do that, but it's still some time out.

mjschmidt commented 6 years ago

Okay can you give an example of the heketi node add command?

Is there a way to hook my terminal to the heketi container if I install the heketi-cli on it, or do I have to bash into the container in order to do these commands?

jarrpa commented 6 years ago

You need to open a shell in the container to use the built-in heketi-cli, yes. You can also install heketi-cli on any machine that can send http requests to the heketi service. So if curl://<heketi_url>/hello works you can use heketi-cli from there. Regardless, you can run heketi-cli -s http://localhost:8080 --user admin [--key <SECRET>] node add --help to get the usage message. Note that the SECRET is only required if you configured one.

mjschmidt commented 6 years ago

Okay so I have the issue where a node contains devices, but the node is dead so I need to delete it with the heketi-cli. But I can't delete if from the heketi-cli because it has pvs on it, but I also cant clean up any of my pvs that are deleted because one of the peers are missing. This leads to me running out of space on my kubernetes cluster.

How do I deal with this? Is there a way to force remove a node from heketi with the cli?

mjschmidt commented 6 years ago

When I try to disable and remove the node that fails because now the node is dead/gone @jarrpa correction: the node reports as online even after my attempt to disable it also reports as not there when I attempt to remove when I attempt to delete it fails because it says there is stuff on there

I want to remove the node because it is already dead and removed from the cluster

Is there a way to remove all bricks from a node?

mjschmidt commented 6 years ago

@nixpanic any idea? Any idea of who I can ask?

mjschmidt commented 6 years ago

last try (and tomorrow morn) to reach @jarrpa or @nixpanic or anyone else before I try opening a new ticket (this could potentially be a new ticket anyway).

I have glusterfs pretty much up and running great, and I have learned a lot about trouble shooting kubernetes nodes to bring them back to life. However, I want to know how deal with node death in this setup because heketi doesn't want to remove nodes that it thinks has bricks on it, but kubernetes doesn't want to clean up pvcs unless the whole gluster cluster is present, in turn causing the k8s cluster to run out of storage. What do I do if I am running gluster in k8s and a node literally blows up and is disconnected from the cluster with no hope of ever coming back. Lets also say in this hypothetical instance that I have 12 gluster nodes so I would still be above the 3 min requirement.

jarrpa commented 6 years ago

Looks like this conversation continued in https://github.com/heketi/heketi/issues/1120.

mjschmidt commented 6 years ago

It is I am still waiting on an answer from them on some stuff, is there a command I can run on gluster itself to see if it is out of space or not? My total pvcs do not take up all the space I provided, so I think it may be an issue with stuff not being cleaned up correctly?)

jarrpa commented 6 years ago

From gluster, no. From the GlusterFS pods you can run standard LVM command to inspect the configuration and usage. Ones to be likely be most relevant are pvs, vgs, pvdisplay, and vgdisplay.

mjschmidt commented 6 years ago

I did the lvm vgdisplay on the gluster node and I see what I think is the relevant volume (~100Gig)

I see VG size 99.87 gib PE size 4.0 mib total PE 225567 alloc pe / siz 23437 / 91.55 gib Free PE 2130 / 8.32 gib VG UUID AcF6yadayadayada

mjschmidt commented 6 years ago

Okay something between heketi and gluster isn't working right. Heketi I show my 14 volumes but when I look at the available space on the node I see no free space. This happens when ever I have gluster and heketi up for an extended period of time making believe that heketi isn't actually cleaning up the space it says it is. How do I ask gluster these questions, for example. what are all the volume id's you have? Because then I could compare volume id's in gluster with volume ids in heketi and see which volumes in gluster need destroying.

jarrpa commented 6 years ago

You may be hitting another heketi issue. You want to compare against the output of gluster volume list. gluster volume status will also tell you the names of each volume's bricks.

mjschmidt commented 6 years ago

great thanks @jarrpa

mjschmidt commented 6 years ago

I NOW KNOW WHAT THE PROBLEM IS! Wow that was a long road. Long story short this is a bug between gluster and heketi running in containers. My second pvc claim was for 1 brick with 3 replication according to heketi-cli, but somewhere in translation gluster is creating 8 bricks x 3 replication for a total of 24 bricks. So that is at least one bug

There is also a disconnect between the volumes heketi says I have and the amount of space aviable on each node: My total heketi claims for volumes total to 153 GIB How ever the space used from adding up all the nodes when I run heketi-cli node info is 486 GIB

I assume heketi is finding the space available in gluster by doing a gluster command I don't yet know about. Something about the way heketi and gluster are communicating is off and over time costing space. I don't know how to debug if it is gluser or heketi, but it seems to be a combination of two bugs?

Here is the long story if interested I didn't super well edit it and is somewhat notey because I have the short story: Okay so I found something very interesting that probably explains my space problems... So I started out by checking out the gluster list and then checking out the volume info on each other the volumes. I found one that was creating 24 bricks! This pvc happened to be 11 gib each in size. That would explain my space problem. It also sees a second set pvc associated with the same stateful set deployment, but this set of pvcs do not have any problems associated with them so we ignore them for now and just know that I should be taking up and regaining a total of (11 gib for first claim per container 3 for replication 3 containers each making a pvc claim + 6 gib for first claim per container 3 for replication 3 containers each making a pvc claim = 153 total gib for the stateful set) Since in this stateful set deployment only had two containers deploy/attach pvcs I only had 101 gib that should be returning to a heketi that believes it is full.

So there is a disconnect between heketi and gluster somewhere. When I look at the volume in heketi using heketi-cli volume info here are the important rows: Size: 11 Mount: :vol_ Durability type: replicate Distributed+Replica: 3

When I bash into a gluster container and do gluster volume info I see: Volume ID: Distributed-Replicate Volume ID: 6ba8blah-blah-blah-blah12aab7a3 Status: Started Snapshot Count: 0 Number of Bricks: 8 x 3 = 24

This is where I believe the problem lies, a replica pod that has an identical pvc claim is only getting 1 x 3

I cleared out those pvcs from kubernetes to see what would happen and to make sure my stuff would get cleaned up. Here is what I found: I take down the statefulset that had the associated pvc claims After the stateful set is gone I clean up the PVCs with kubectl delete pvc <pvc 1> <pvc 2> <pvc 3> I check the heketi-cli to make sure the 11 gib volumes I expect to be deleted are deleted: I check gluster to make sure that the associated volumes are deleted and they were in fact deleted

I then went back to heketi-cli and did some volume node info to make sure I gained back my space, but since there is a disconnect between what gluster thinks is happening and what heketi thinks is happening I only have 108 gib of space total even though I should have gotten back 264 gigs of space for the one volume that . Heketi believes that when it cleared out the pvcs it regained

mjschmidt commented 6 years ago

https://github.com/heketi/heketi/issues/1127

also posted here

jarrpa commented 6 years ago

....dang. That's really weird, I've never heard of heketi creating an 8x3 volume from a dynamic provisioning request. Are you able to reproduce this behavior consistently?

We have also been aware of heketi's tendency to get out of sync with the state of Gluster underneath it. The most recent release of heketi does a lot to address these issue. Check if you're running heketi v6.

mjschmidt commented 6 years ago

I was not before, I just switched to heketi:6 today on a seperate cluster for testing, but now I am having deploy issues I am working through for that. I will update on that front when I figure out the nitty gritty details. My question on that front is, can I run heketi 6 with older gluster containers (what ever the latest was when we first started this thread and weren't best friends) Yes the heketi database seems to be getting out of sync with gluster underneath I saw the stuff about heketi 6. I really hope that fixes things In general, am I able to resync the database with gluster? How do I do that?

jarrpa commented 6 years ago

You should be able to run heketi:6 with older GlusterFS images, yes. And yes, it is possible with some new DB management tools to tyr and repair an out-of-sync DB. Ask in the heketi project for more info on that.

mjschmidt commented 6 years ago

I am putting this at the top, but its really the last thing you should read: OH MY GOD I DID ALL THIS TRANSCRIBING TO FIND OUT GLUSTERFS DELETED A VOLUME AND HEKETI DIDN'T KEEP TRACK (I did heketi-cli volume list and got that the volume was not cleaned up then did a gluster volume list and saw gluster did in fact clean it up... fuq...) To me these look like heketi issue and it getting itself into a bad state? I tried to make a new volume and it appears heketi is just frozen up... Check the second set of logs for those details. I am mostly putting this here for people who may have simular issues, I have thrown this to the heketi team in this issue here, but if gluster has a work around or has experienced this please let me know.

Okay I deployed heketi:6 to our old cluster, the new cluster I was setting up was being done with puppet and there was no way to see if it was a puppet issue or a heketi issue. I have confirmed it is a puppet issue, so lets ignore that for now and we are staying consistent by staying with the old cluster. I deployed heketi:6 successfully with "old" (its not that old) gluster and successfully created a dynamic pvc claim.

However now when I delete the pvc claim the pv is not cleaning up. I bashed into the hekti container and Please keep in mind I have to hand transcribe these logs, so I am getting the relevant stuff. and ignoring stuff like ids and times I see the following relevant information from the logs: [negroni] Started DELETE /volumes/4cdaa... [heketi] INFO 2018/04/25 Loaded simple allocator [negroni] Completed 202 Accepted in 23.03ms [asynchttp] Info time and data> Start job d6444r8... [heketi] INFO 2018/04/25 Started asymc operation: Delete Volume [negroni] Started GET /queue/d6444r8... [negroni] Completed 200 OK in 85.9 [kubeexec] DEBUG 2018/04/25 /src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: glusterfs-vnzql --mode=script snapshot list vol_4cdaa... --xml Result: ?xml version="1.0" encoding="UTF-8" standalone="yes"?> cliOutput> opRet>0/opRet> opErrno>0/opErrno> opErrstr/> snapList> count>0/count> /snapList> /cliOutput> [kubeexec] DEBUG 2018/04/25 /src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: glusterfs-vnzql --Command lvs --options=lv_name, thin_count --separator=: Result: LV:#Thins vol_data: brick_552c...: tp_552c...:1 lv_home: lv_home: lv_root: lv_swap: lv_temp: lv_usr: lv_usr: lv_var: lv_var: [kubeexec] DEBUG 2018/04/25 /src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker5 Pod: glusterfs-2g16b Command: lvs --options=lv_name, thin_count --separator=: Result: LV:#Thins vol_data: brick_b9c6...: tp_b9c6..:1 lv_home: lv_home: lv_root: lv_swap: lv_temp: lv_usr: lv_usr: lv_var: lv_var: [kubeexec] DEBUG 2018/04/25 /src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker4 Pod: glusterfs-pg6ct Command: lvs --options=lv_name, thin_count --separator=: Result: LV:#Thins vol_data: brick_d0d4...: tp_d0d4..:1 lv_home: lv_home: lv_root: lv_swap: lv_temp: lv_usr: lv_usr: lv_var: lv_var: [negroni] Started GET /queue/d644... [negroni] Completed 200 OK in 181ms [negroni] Started GET /queue/d644... [negroni] Completed 200 OK in 73ms [negroni] Started GET /queue/d644... [negroni] Completed 200 OK in 73ms [negroni] Started GET /queue/d644... [negroni] Completed 200 OK in 73ms [kubeexec] DEBUG 2018/04/25 /src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: glusterfs-vnzql Command: gluster --mode=script volume stop vol_4cdaa... force [negroni] Started GET /queue/d644... [negroni] Completed 200 OK in 73ms Result volume delete: vol_4cdaa...: success [negroni] Started GET /queue/d644... [negroni] Completed 200 OK [kubeexec] DEBUG src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: glusterfs-vnzql Command: umount /var/lib/heketi/mounts/vg_7a90.../brick_d0d4... Result: [kubeexec] DEBUG src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker5 Pod: glusterfs-vnzql Command: umount /var/lib/heketi/mounts/vg_548f.../brick_552c... Result: [kubeexec] DEBUG src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker4 Pod: glusterfs-vnzql Command: umount /var/lib/heketi/mounts/vg_357e.../brick_b9c6... Result: [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK THIS FOR A REALLY LONG TIME UNTIL [kubeexec] DEBUG 2018/04/25 src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: glusterfs-vnzql Command: gluster --mode=script snapshot list vol_4cdaa... Result: ?xml version:"1.0" encoding="UTF-8" standalone="yes"?> cli-output> opRet>-1/opRet> opErrno>30806/opErrno> opErrstr>Volume (vol_4cdaa...) does not exist /cli-output> Result: LV:#Thins vol_data: brick_b9c6...: tp_b9c6..:1 lv_home: lv_home: lv_root: lv_swap: lv_temp: lv_usr: lv_usr: lv_var: lv_var: [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK THIS FOR A REALLY LONG TIME UNTIL END OF FILE

I tried to make a new pvc to test again and it froze up, never filling the pvc claim, i then deleted that pvc claim and tried to make a second one with a slightly different name just in case that was the problem None of that worked. Here are the logs of me trying that: [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/b6de [negroni] Completed 200 OK [negroni] Started GET /queue/2448 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 [negroni] Completed 200 OK [negroni] Started GET /queue/b6de [negroni] Completed 200 OK [negroni] Started GET /queue/2448 [negroni] Completed 200 OK [heketi] INFO Loaded simple allocator [heketi] INFO brick_num: 0 [negroni] Completed [asynchttp] INFO asynchttp.go: Started job d9bda... [heketi] INFO started async opertation: Create Volume [heketi] Creating brick 7133... [heketi] Creating brick 8f08... [heketi] Creating brick e9dc... [negroni] Started GET /queue/d8bda [negroni] Completed OK [kubeexec] DEBUG src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: glusterfs-vnzql Command: mkdir -p /var/lib/heketi/mounts/vg_584f.../brick_8f08... Result: [negroni] Started GET /queue/d8bda [negroni] Completed OK [kubeexec] DEBUG src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: glusterfs-vnzql Command: lvcreate --poolmetadatasize 16384K -c 256K -L 3145728K -T vg_548f.../tp8f08... -V 31457287K -n brick_8f08... Result: Using defult stripesize 64.00 KiB Thin pull volume with chunk size 256 Kib can address at most 63.25 Tib of data Logical volume "brick_8f08..." created I am tired of typing, but the point is that it gets to this point [kubeexec] src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: gluster-vnzql Command: mount -o rw,inode64,nouuid /dev/mapper/vg_548f... Result: [kubeexec] src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: gluster-vnzql Command: mkdir /var/lib/heketi/mounts/vg_548f/brick_8f08 Result: [kubeexec] src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: gluster-vnzql Command: chown :2001 /var/lib/heketi/mounts/vg_548f/brick_8f08 [kubeexec] src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: gluster-vnzql Command: chown 2775 /var/lib/heketi/mounts/vg_548f/brick_8f08 [negroni] Started GET /queue/b6de [negroni] Completed 200 OK [negroni] Started GET /queue/2448 [negroni] Completed 200 OK [negroni] Started GET /queue/d644 A LOOP SIMILAR TO THIS ENDS THE FILE

mjschmidt commented 6 years ago

Also @jarrpa I aborted the gluster deployment and I am left over with a brick that was not cleaned up.

I am as a result unable to delete my volume because it is still in use by another device.

A lvc --segments reveals LV VG Attr #str Type ssize vol_data vg01 -wi-ao---- 1 liniar 297g brick_39 vg_ee.. Vwi-a-tz-- 0thin 2.00g tp_39... vgee twi-aotz-- 1 thin_pool 2.00g

I want to destroy this brick without gluster (since gluster deploy is down) so that I can ultimately destroy my lvm

jarrpa commented 6 years ago

wipefs -a on the underlying device should do the trick.

mjschmidt commented 6 years ago

Okay I already deleted my volumes but if I encounter this again its good to know how to handle it (:

mjschmidt commented 6 years ago

@jarrpa have you seen this issue with heketi:6 container? From the logs it looks like heketi is starting to do all the right things then it gets stuck then never retries to continue onward? Thoughts? I don't see any gluster issues from the heketi logs.

jarrpa commented 6 years ago

I'm not sure, I don't track exact heketi versions as well as I should. I know in the 6 there was a lot of work put into improving stability especially with regards to volume creation and deletion. Is the current issue that you can't get the volumes to delete? Is this via kubectl, heketi-cli, or both?

mjschmidt commented 6 years ago

@jarrpa Do you guys run containerized gluster and heketi? Or is this something that is not really being done?

The problem is that it seems like heketi is getting stuck provisioning volumes in the middle of creation then freezing up.

I see this in the heketi logs

[kubeexec] DEBUG src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: glusterfs-vnzql Command: lvcreate --poolmetadatasize 16384K -c 256K -L 3145728K -T vg_548f.../tp8f08... -V 31457287K -n brick_8f08...
Result: Using defult stripesize 64.00 KiB
Thin pull volume with chunk size 256 Kib can address at most 63.25 Tib of data
Logical volume "brick_8f08..." created
I am tired of typing, but the point is that it gets to this point
[kubeexec] src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: gluster-vnzql Command: mount -o rw,inode64,nouuid /dev/mapper/vg_548f...
Result:
[kubeexec] src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: gluster-vnzql Command: mkdir /var/lib/heketi/mounts/vg_548f/brick_8f08
Result:
[kubeexec] src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: gluster-vnzql Command: chown :2001 /var/lib/heketi/mounts/vg_548f/brick_8f08
[kubeexec] src/github.com/heketi/heketi/executors/kubeexec/kubexec.go:244: Host: worker7 Pod: gluster-vnzql Command: chmod 2775 /var/lib/heketi/mounts/vg_548f/brick_8f08

And I would expect to see these steps next:

[cmdexe] INFO Creating volume vol_<id> replica 3
[kubeexec] Debug <statement with the stuff that does the gluster command> to "gluster --mode=script volume create vol_<id> replica 3 <hosts of bricks it is mounting> 
Result: volume create: vol_<id> success: please start volume to access the data"
[kubeexec] <debug statement to the host that has the started volume on it> gluster --mode=script volume start vol_<id> 
Result: volume start: vol_<id>: success
[asynchttp] INFO <stuff here> Completed job <job_id> in ~7 seconds

But they are never happening and heketi just stops doing stuff, no error messages to be seen, no retries but it continues to heartbeat the volumes it has already created.

jarrpa commented 6 years ago

Yes we do, that is the entire point of this project. :) We even run it in enterprise production environments. I thought you had gotten this to work at some point? Also you didn't answer any of my previous questions.

mjschmidt commented 6 years ago

I had heketi working but the database was getting out of sync and there were disc space leaks that meant after deleting and recreating enough pvcs we would have to tear down the gluster deploy, reprovision etc.

The issue I am having with heketi:6 is that after a while heketi stops deleting and creating volumes. I even tried bashing into the container at one point to use the heketi cli to delete a volume, but I would time out of the container before it could ever try to carry out the process. I got the logs from the heketi container to see what was going on (that last answer) and like I said it just cut out in the middle of a volume create and then never did anything else.

I also have tried deleting the heketi container (i know a bad idea), but heketi now watches for that and doesn't let a new container come up if its database was in the middle of doing gluster things

jarrpa commented 6 years ago

Ugh... that is both very strange and outside my realm of expertise. Maybe @phlogistonjohn or @raghavendra-talur can provide some insights here.

phlogistonjohn commented 6 years ago

As of version 6 Heketi generally operates in the following manner:

The request is processed, the needed elements (devices, etc) are determined, and that is all logged in the db
Heketi creates (or deletes, etc) the items from the gluster nodes.
If no errors occurred the db entries are marked as no longer pending, otherwise attempts are made to undo the operation

Because of this there are a few common ways it can get out sync with the underlying systems.

Heketi process is terminated while it is performing an operation. This can be triggered by a node restart, pod move, etc.
Heketi is executing a command on a node and the communication channel fails in a way that the operation is unable to fail (see below). Typically this would eventually lead into the first item because the server with a blocked operation is eventually going to get restarted.
The gluster nodes that Heketi is working with are being restarted in such a way that the rollback can not complete. An example is a gluster node being restarted between creating a volume and starting the volume. This will fail the create operation but if there are too few nodes to allow deleting the incomplete volume heketi can not do anything more about the situation.

The 2nd point has been discussed recently among the Heketi developers. There is some concern that the kubernetes executor is not as stable as we need and that communicating with the gluster pods through that channel could lead to server hangs.

In cases 1 & 2 the things that heketi is going to process is logged in the db. These items can be viewed from a db dump (heketi-cli db dump) and can be manipulated via the offline command heketi db delete-pending-entries (please review the help prior to use). You can also edit and reload the db dump manually but beware that this can lead to internal inconsistencies between the device storage counts if you add/remove bricks without making the correct changes to the device storage counters.

Case #3 was discovered after the version 6 release, AFAIR. There should be a fix in master for that condition. It will make #3 behave like 1 & 2 in that if the volume can not be cleaned up it will remain in the db for manual cleanup later.

If your shell sessions into the heketi container are being interrupted please verify that the pod/containter is being restarted. It may be failing liveness checks (due the root issue) and then being restarted by kubernetes. I'm not entirely sure about that part myself.

To wrap up, I think you are running into the following situation: something is preventing heketi from running operations to successful completion. This then may be causing heketi to be restarted. Restarts lead to stale operations and apparent de-syncs between heketi and the underlying system. While I can't guarantee this is the case it seems like a good place to start.

I would:

Keep track of when / how often the heketi container get restarted
Keep track of when / how often the gluster containers get restarted
Monitor how busy the master node(s) are, I've been told that the kubernetes executor has to go through the master
Use the heketi-cli db dump and heketi db export commands to check for pending operations in the db (the latter is better but must be done when heketi is stopped)
Read up on the heketi db delete-pending-entries command to see if will help you resolve sync issues
Attempt to root-cause the freezes you seem to experiencing. This may require you to disable liveness checks on the pod.

I know this is a lot to digest but without specifics all I can do it outline the general areas I would look into if I were debugging this issue.

mjschmidt commented 6 years ago

I have seen gluster node restarts that think may have coincided with heketi failure. I am not sure if this also causes heketi restarts or not. I will look into this more.

So I tore it down again for more testing. But I will try to figure out if it is 1 2 or 3 and get back to you all at some point. I appreciate the help.

mjschmidt commented 6 years ago

So my current way forward is to deploy gluster underneath kubernetes rather than in Kubernetes, and only run the heketi container inside of kubernetes since it seems like gluster container restarts are the issue I am facing. I want to see if deploying gluster not in containers solves my problems or not and debug from there, gluster is already distributed so I don't really need it running in a container.

@jarrpa if I am only running the heketi in the container for the deployment. What steps with gluster do I need to take ahead of time. Right now I have gluster installed (4.0 for now) on centos7 machines yum list installed | grep gluster gives me centos-release-gluuster40..x86_64----------------1.0-2.el7.centos glusterfs.x86_64------------------------------------ 4.0.2.el7 glusterfs-api.x86_64------------------------------- 4.0.2.el7
glusterfs-cli.x86_64--------------------------------4.0.2.el7 glusterfs-client-xlators.x86_64--------------------4.0.2.el7 glusterfs-fuse.x86_64------------------------------4.0.2.el7 glusterfs-libs.x86_64------------------------------ 4.0.2.el7 glusterfs-server.x86_64---------------------------- 4.0.2.el7 userspace-rcu.x86_64-------------------------------4.0.2.el7

I just conencted the nodes with gluster peer probe, but I realized I may not want to get ahead of myself since I am not sure which parts I need done for heketi and witch ones heketi takes care of. The docs I found here : https://github.com/gluster/gluster-kubernetes/blob/master/docs/setup-guide.md

and here: https://github.com/heketi/heketi/blob/master/docs/admin/topology.md

Didn't really answer the question of "when do I stop and let heketi take over" from the gluster perspective for me.

mjschmidt commented 6 years ago

Also I spread the cluster out (less space on each node, but more nodes for same total amount of space) to try to eliminate issue #3 that @phlogistonjohn pointed out.

jarrpa commented 6 years ago

You do not want to do a peer probe. You only want the glusterd server running, no configuration done. Also make sure you have the requisite kernel modules loaded, the needed firewall ports opened, and the target devices are all bare (wipefs -a <device> should do it).

mjschmidt commented 6 years ago

Okay that is easy (: I can uninstall and reinstall just to be safe on that front

mjschmidt commented 6 years ago

Ugh still working this. Had fips mode on... I don't think this is fully fixed in gluster 4

mjschmidt commented 6 years ago

Also if I ever lose a gluster peer heketi is SUPER mad and won't run. Its like 14 steps to clean up gluster stuff and sometimes it doesn't even work. It would be amazing if there were force options in gluster and heketi to just say "I give up, you are distributed database abandon a node and redistribute the now lost volumes"

Then be able to go clear out the node and bring it back onlineline and re add it to heketi at a later date

jarrpa commented 6 years ago

Agreed, that is a very sore point in the user experience. :(

What's the status at the moment? Have we really not gotten past initial deployment?

mjschmidt commented 6 years ago

So I have gotten past initial deployment both in containers and outside of containers, the issue is that I haven't been able to get any semblance of stability from Heketi and Gluster thus far.

Here my user evaluation thus far. If this is due to gaps in my understanding I welcome additional feedback and knowledge that can help me achieve what I am going for, which ultimately would be "cloud native storage for my k8s applications", though I would very much settle for performant/stable storage for my cloud native applications. This is why I wanted to go the Heketi/Gluster route in the first place.

I have been trying to test further with hard installed Heketi and Gluster providing storage up to Kubernetes on bare metal servers due to @phlogistonjohn concerns with the kube-system pods not being stable enough for Heketi's purposes running in a container on Kubernetes. I was hoping the more external pieces I cut out the closer I would get to the root of the problems I am having and ultimately a final solution. So with hard install setup, I am seeing this crazy thing happen with Gluster where it crashes the machines and containers because for some unknown reason it is spinning up hundreds or thousands of Gluster deamons eventually crashing the machine. I can only assume that this same thing is happening in containerized gluster, though I have no hard evidence of this. This is when the real problems start as it causes to get Heketi out of sync with the cluster, its extremely difficult to clean up Gluster volumes, I also can't remove and re-provision problem nodes and get that back in sync with Heketi due to stuff discussed above, I can't force Heketi to remove volumes or nodes if dead nodes aren't working, and even if I could, I don't think I have the ability to say "okay as a system since I know that force removed node a had volumes 1 and 2 on it, I now need to redistribute that workload to other parts of the cluster". This functionality works, but only if the cluster is working perfectly and healthy, which I would argue that this functionality is desired mostly when the cluster is unhealthy and unhappy and a sys admin is attempting to get the cluster back to operating status.

I understand that these are all extremely difficult things to implement and there is a large amount of complexity with this sort of setup. You guys have been extremely helpful in getting me this far, but since we were talking status and where we are at. I definitely am not meaning to bash you guys in any way shape or form.

jarrpa commented 6 years ago

Thank you for the thorough feedback! Indeed, the failure scenario user experience leaves much to be desired. Howeverm I'll mention that the latest versions of heketi have the ability to directly edit their DB. While I don't know much about it personally, hopefully @phlogistonjohn can chime in with more.

mjschmidt commented 6 years ago

I mean that would be great because then I could go back and at least edit the db in heketi and have it working.

But that would basically make the memory leak issues that heketi did so much to fix gluster's problem because if I can't go clean up the gluster volumes (since a node is down) then still have a hard time figuring out how to use this in a prod setting? Suggestions on that front?

mjschmidt commented 6 years ago

So you guys over there really don't see any of these issues running gluster and heketi in contianers?

What OS are you guys running? What does your k8s cluster look like? I think you all mentioned you were bare metal? What is doing the install of Kubernetes? What version of K8s? So many questions.

jarrpa commented 6 years ago

Sorry for the delay, I'm been traveling and later swamped because of it.

Unfortunately, yes, we have really not seen issue like the ones you're running into. This is across the Red Hat family of distributions (Fedora, RHEL, etc.) in multiple environments including baremetal, VMs, and cloud. I myself have been running this since Kubernetes 1.7 on through 1.10, installed via kubeadm, on CentOS VMs.

mjschmidt commented 6 years ago

I think at this point we can close this issue. We have decided where we are to go the local persistent volume route and deal with managing the applications that need volumes manually until a better solution for local pvcs come out. A lot of applications in our use case are already distributed so we don't really use the gluster replication. I really appreciate the help and I learned a ton about volume management, gluster, heketi, containers and kubernetes.

jarrpa commented 6 years ago

No problem! It's been a wild ride, sorry that we couldn't make it work out for you, but I hope the local storage solution proves sufficient!

gluster / gluster-kubernetes

Having issues with gk-deploy script #434