Open shootkin opened 6 years ago
Hm... Very strange. I checked the status of heketidbstorage volume, and it turned out that 1 of 3 brick is offline. So I stopped this volume and started it back. After that I recreated heketi pod. Now all works fine but it is very unappropriate bug.
Having 1 out 3 bricks offline is not great but it would not lead to that particular error message from stat IMO. Even if the volume was read-only the stat should not report "No such file or directory" unless that path didn't exist.
Was the content of the heketi db surviving pod migration from node to node? If not, that's a bigger issue that we should look into.
Otherwise, if you are just seeing that message on the first times the heketi pod starts, its not a big issue but we could fix it. I just want to make sure we are not focusing on something small (the error output) vs something bigger like the heketidbstorage not replicating correctly on your setup.
Having 1 out 3 bricks offline is not great but it would not lead to that particular error message from stat IMO. Even if the volume was read-only the stat should not report "No such file or directory" unless that path didn't exist.
Was the content of the heketi db surviving pod migration from node to node? If not, that's a bigger issue that we should look into.
Otherwise, if you are just seeing that message on the first times the heketi pod starts, its not a big issue but we could fix it. I just want to make sure we are not focusing on something small (the error output) vs something bigger like the heketidbstorage not replicating correctly on your setup.
"Having 1 out 3 bricks offline is not great but it would not lead to that particular error message from stat IMO" - I'm sure that it is an issue with the 1 out of 3 bricks offline, cause all my persistent volumes became unvisible to pods. When I restarted ALL volumes manually by command "for i in $(gluster volume list); do echo 'y' | gluster volume stop $i && gluster volume start $i;" and deleted old pods, new pods started succesfully interacting with persistence volumes. I don't know why I see such behavior. I have 4 nodes in my gluster cluster, with Replica 3, so 1 brick offline mustn't cause an error, but it cause.
Hello! I have a heketi pod on my kubernetes cluster. Today node on which it was running was rebooted, so that heketi pod was re-assigned to another node. After that, if I run command:
kubectl logs heketi-7947d8f8b-vxszl
I see such output:If I run a commands:
kubectl get pods | grep gluster
andkubectl exec -ti glusterfs-c7tpd gluster peer status
I see that all glusterfs pods are running and nodes are in live state:glusterfs-c7tpd 1/1 Running 2 49d
glusterfs-gs69n 1/1 Running 1 28d
glusterfs-lvnkt 1/1 Running 2 40d
glusterfs-mzb84 1/1 Running 0 49d
All volumes are in live state too.
So the question is: How to fix this bug?