Closed mjschmidt closed 6 years ago
3rd one is admittedly not a question.
Hey there! Sorry to hear about your troubles. Let's see...
kubectl delete ds
on the DaemonSet, that should also remove the pods. There shouldn't be any Kube caching problems. Be sure to delete any remaining GlusterFS configuration directories (e.g. /var/lib/glusterfs) on the host. You can get a list of these directories from the hostPath mounts in the GlusterFS DaemonSet template.Try a few of my recommendations above. If nothing works, then I'm going to want to look at:
gk-deploy
with -v
around when it failskubectl describe po
on one of the GlusterFS podsE
) lines from the tail-end of /var/log/glusterfs/glusterd.log
on any of the nodes running GlusterFS pods.How can I check that the correct kernels modules exist?
I will not be able to take screenshots, but I can do my best to get you the most info that I can.
I found the host path mounts, so I will clear those out. For some weird reason the script deployed two identical deamonsets deployments to kubernetes. It still only deployed a single set of pods, but that may have been why the script failed to take down the Deamonset on --abort.
If you ran the script more than once that would account for why there are more than one DaemonSets... that's the only thing I can think of, anyway. But the presence of another set of GlusterFS pods would account for some of these issues. Make sure the DaemonSets are removed and any associated node labels are also removed.
For kernel modules, lsmod | grep <name>
will show you if a given module is present, and modprobe <name>
will load a given module. You'll want to create a file in /etc/modprobe.d/
to load those modules at startup would also be a good idea.
Sorry I am looking at the setup guide and I don't see the modules that I need there?
For the host paths, should I delete the folder as well. For example should I remove the whole heketi folder?
I noticed in /run/lvm/ there are two files in there, I went to look at them and I got permission denied even as root. Is that lvm folder created by the gluster deployment (and therefor I can delete) or could it be created by something else and I shouldn't just recursively delete that folder?
In the folder I see a lvmetad.socket and a lvmpolld.socket file
NOnono, do not delete any system directories! Only delete the ones specific to GlusterFS and heketi (they should be obvious by name). And yes, you can remove the directories themselves.
If you don't skip it, the intro text to gk-deploy tells you what you need to have available: https://github.com/gluster/gluster-kubernetes/blob/master/deploy/gk-deploy#L528-L560
I already did... just kidding ( : So I plan to delete these folders: /var/lib/heketi /etc/glusterfs /var/log/glusterfs (though I cant see why logs would matter) /var/lib/glusterd /var/lib/misc/glusterfsd
and not delete: /run/lvm /dev (obviously) /sys/fs/cgroup /etc/ssl
Ah I see in the gk-ddeploy skip the kernel modules. I can look at those next. Let me know if proposed delete/donotdelete list looks reasonable. Thanks!
A reasonable list. Go ahead!
Okay I got my modules loaded up, I had seen that before but for some reason choose not to explore it more. Those are all loaded up now.
how can I check that mount.glusterfs is available?
Can I just sudo yum install the glusterfs-fuse.x86_64 package on centos7 and assume it is there?
list of packages proposed to yum install on worker nodes: centos-release-gluster313 (or 312) glusterfs-cli.x86_64 glusterfs-fuse.x86_64
I decided for now I would only get the cli and fuse packages I listed above and give it a try.
The deployment was not successful Error: unable to create new node: New Node doesn't have glusterd running How do I turn on glusterd?
I saw that you can enable glusterd.service , but when I tried that I got no such file or directory.
If you're running GlusterFS in pods, heketi is looking for clusterd in the contaiers and there must be NO glusterd present on the host. You do not "turn on" glusterd in this case, it should be running when the containers come up. The error you're getting from heketi could mean a number of things, but fundamentally it means it can't talk to a glusterd process where it expects it should. Check the outputs of kubectl logs <heketi_pod>
and kubectl describe po <gluster_pod>
. If nothing shows there, check the GlusterFS logs on the hosts as I mentioned above.
glusterfs-fuse
is the only RPM you need to install to mount GlusterFS volumes.
Okay got it, I will remove the gluster-cli from the nodes then
By the way, the gk-deploy is still creating 2 deamonset deployments in kubernetes, but only one set of pods (so just when I go to tear it down I have to delete the deamonset manually and clear out the host file folders, which isn't a huge deal)
So things running: the three modules gluster-fuse.x86_64
Still getting the unable to create node from heketi, so I'll grab the logs and transcribe them.
kubectl logs heketipod: Error
kubectl describe po gluster pod: I don't know what you would be interested in here?
I see a Tolerations section: node.aplha.kubernetes.io/notReady:NoExecute node.aplha.kubernetes.io/unreachable:NoExecute node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule
When I describe the gluster pod it says that it is initialized, ready and scheduled as True
From the heketi logs I also see:
Started GET /clusters/
Please don't spam multiple messages if you can avoid it.
On the describe I'm looking for Events towards the bottom to see if any of them indicate an error.
Are you sure you have no GlusterFS DaemonSets or Pods running after you clean up a failed deployment? Are you using something other than the default namespace and could there be something conflicting in another namespace?
You never answered an earlier question: do you have the necessary firewall ports open on all GlusterFS nodes?
Please don't spam multiple messages if you can avoid it. Sure not a problem
For firewall ports I have a security group set up that allows: TCP over ports 2222, 24007, 24008, 49149-49251
So that shouldn't be an issue. Also I think with our setup if nodes are in the same region then communication within the region should not be restricted... I just set up the security group to be extra safe that ports would not be an issue.
I am sure that I clean up all the pods, first I delete the deamonset with: kubectl delete ds/glusterfs
by the way on that deamonset note, its still doing the weird thing where it creates two deamonsets but deploys one set of pods. I have seen that before on other issues, but I don't know if it was ever resolved.
Then I do a kubectl get all to make sure nothing is there and I can see only the svc/kubernetes service running.
I went to the host machine and noticed that when I did a sudo yum list installed | grep gluster that a few packages were installed in addition to fuse as dependecies: glusterfs.x86_64 3.8.4-18.4.e17.centos @os_centos_base glusterfs-client-xlators.x86_64 3.8.4-18.4.e17.centos @os_centos_base the one I wanted-> glusterfs-fuse.x86_64 3.8.4-18.4.e17.centos @os_centos_base glusterfs-libs.x86_64 3.8.4-18.4.e17.centos @os_centos_base
Is this an issue?
Hmm... yes, the networkign you described should be working.
I don't recall other issues with multiple DaemonSets... what are the names of the DaemonSets and are they in the same Namespace?
"all" may not get everything we're interested in. Try kubectl get deployment,ds,po,job,secret,ep,svc,sa --all-namespaces
and see if there's anything with the words "glusterfs" or "heketi" in it.
Those packages are the correct ones. The reason I only specify glusterfs-fuse
is that it takes care of installing all needed dependencies.
Hmmmm insteresting, so after the abort I did a kubectl get all (in just my namespace) and I see the deamonset twice, however when I do a kubectl get ds (Just in my namespace at first) I only see the one deamonset. So this may be an issue with the kubectl get all. As a result, I think that can be ignored for now.
kubectl get deployment,ds,po,job,secret,ep,svc,sa --all-namespaces | grep gluster
and only saw the pods and the daemonset deployment running, then I deleted the deamonset with kubectl delete ds glusterfs and that deleted out just fine
kubectl get deployment,ds,po,job,secret,ep,svc,sa --all-namespaces | grep gluster and found nothing kubectl get deployment,ds,po,job,secret,ep,svc,sa --all-namespaces | grep heketi also found nothing
Other things I can check? Oh by the way the names of the deamonsets when I had two of them (but again only one set of pods) was ds/glusterfs and ds/glusterfs, so they were named the same.
Sorry needed to add this doing ./gk-deploy -gyv, that also should not be an issue right?
OKay yeah something is screwy with your cluster. You can't have two resources of the same name and type in the same namespace.
The command you listed is definitely fine.
If you're at a point where the heketi and GlusterFS pods are up and running but heketi can't communicate to GlusterFS, check that all pods are showing 1/1 Ready
. Next try going into a GlusterFS container and running systemctl status glusterd
to see if the process is running. If all this is fine, you may have something weird in your networking. Note that the GlusterFS pods use host networking, meaning that they assume the network identity of the hosts they're running on. You would have to verify that processes in containers in your cluster can find routes to the host IP addresses.
glusterd is running in the container
so that is good.
I ran hostname -i from inside the container and that worked and I got the correct IP back from the command.
Any test commands I can run from inside the gluster container that will let me know "yes the cluster is connected"? I did a gluster peer probe to one of the other containers and that worked. Then I did a gluster peer status and saw there was only one peer (the one I had just connected to) so it seems like that isn't quite working correctly?
I went into the heketi container as well and did a hostname -i and that returned command not found. What can I do from that container to see if things are working correctly or not as I debug more?
gluster peer status
only list peers, not the current node. In a two-node cluster only one peer would be shown.
The heketi container is a very stripped down environment. I recommend spinning up something like a busybox pod and trying things like hostname
and ping
there to see if you can get to the host IPs.
I have a 5 node cluster though.
What I am saying is that the gluster pods are not connecting to each other via the script because when I go into a gluster container origonally the container has 0 peers, however when I manually add a peer in the container that works and it connects to a different host ip.
busybox commands and values:
hostname = busybox-
So it looks like my busybox can't ping anything but my gluster container can ping anyt machine in the cluster
I was looking at the gluster yaml file and noticed that there are not any ports specified on there? Is that expected? How does communication happen if there are no ports?
Update, mostly so people who come after me can follow some sort of documentation
I was able to curl heketi and get heketi to say hello to me from inside the heketi pod going through the kubernetes heketi deployment service ip. the command is curl
I was also able to curl hello heketi from a single gluster container that is on the same node. So there are some networking issues going on.
deploy-heketi's ip is 10.111.174.68 with port 8080 open
I think if port 8080 was not open I wouldn't be able to do this... the question is, why does this service ip only work on the single node...
A few things:
curl
or telnet
. This will tell you if cluster networking is functional.ping
will suffice here. This will tell you if the cluster network has access to the network of the hosts.
@jarrpa I'm having a similar issue to mjschmidt. You mentioned making a file in /etc/modprobe.d/ if I need to use modprobe <name>
to load a module at start up; how do set that file up? Is there a example somewhere I can reference?
@jarrpa Next question, which yaml file sets up a pod that gets labelled "heketi-storage-copy-job"?
An update I am still working on this. I have learned more about iptables and networking than I anticipated. For anyone interested in a brief overview of IP tables role in Kubernetes the general gist is:
Kubernetes does a ton with IPtables...
@jarrpa By the way it looks like it was issues with the way I was deploying my kubernetes cluster and flannel or docker networking issues. So the next question is now that my heketi has hooked up with my gluster containers I see it is trying to do the setup-openshift-heketi-storage.
I am not running on Openshift and don't need this. I am running on centos7 machines. How can I exclude openshift stuff from my deployment?
Right now my volumes are attached to heketi and my gluster containers:
Creating node
I don't need heketi cli ability from my kubectl node, I just need heketi to know how to communicate with Kubernetes.
On Fedora/CentOS/RHEL systems you can drop an executabl file under /etc/sysconfig/modules/ . An example would be:
#!/bin/sh
#
# Force loading some modules
grep -q -w dm_thin_pool /proc/modules || modprobe dm_thin_pool
Save that file as /etc/sysconfig/modules/gluster-bricks.modules and make it executable. Upon a reboot the script should get executed and the modules get loaded.
The configuration files in /etc/modprobe.d/ can be used to modify the parameters while the module gets loaded. I dont think there is a way to mark modules to get loaded automatically there.
@nixpanic So I am not very familiar with the modules, or etc/sysconfig.
Does this look like it would be right?
These are the modules I need: dm_snapshot dm_mirror dm_thin_pool
Saving a file here /etc/sysconfig/modules/gluster-bricks.modules that contains:
#!/bin/sh
#
# Force loading some modules
grep -q -w dm_thin_pool /proc/modules || modprobe dm_thin_pool
grep -q -w dm_thin_pool /proc/modules || modprobe dm_mirror
grep -q -w dm_thin_pool /proc/modules || modprobe dm_snapshot
^End file Then I will do a chmod +x /etc/sysconfig/modules/gluster-bricks.modules
Then reboot all the machines that I do this to (in this case the 3 kubernetes nodes that I plan to get this gluster running on).
Then run gk-deploy and profit?
@mjschmidt yes, that looks correct. But you need to update the grep
commands in the script ;-)
But, maybe there is a better way. Sometimes kernels compile certain modules in their main executable, and do not offer to load them. In that case, /proc/modules
will not list them either. To make it less dependent on the kernel/distribution, I should have suggested:
#!/bin/sh
#
# Force loading some modules
[ -d /sys/module/dm_thin_pool ] || modprobe dm_thin_pool
[ -d /sys/module/dm_mirror ] || modprobe dm_mirror
[ -d /sys/module/dm_snapshot ] || modprobe dm_snapshot
Try the script+reboot on one of your nodes, and verify that the modules are loaded (check in /proc/module
or use lsmod
).
I will use the lsmod and grep for the desired modules ( : I'll get back to you in a bit.
@nixpanic Okay cool it looks like that works ( : and that wasn't too bad, but what does that have to do with the heketi openshift stuff I was referring to earlier?
I don't think I am understanding that yet.
The setup-openshift-heketi-storage
command executed by Heketi will create a Gluster Volume to store the heketi database on. Creating the LVM/LVs can fail if the dm_*
modules are not loaded. These should get loaded automatically, but that only works if the container can load them (which might not be the case because of several reasons).
@mjschmidt @nixpanic I'll note that the setup-openshift-heketi-storage
command is mislabeled. It applied to both Kubernetes and OpenShift. It is currently needed by the configuration used in gk-deploy
, as it allows for robust persistent storage for the heketi database.
Supposedly there is a way that you can just specify module names in a flat file so that they get loaded on system startup. See here: https://github.com/mjudeikis/openshift-ansible/blob/7d67afffe217e29c7bb8941343bd017907ffc9df/roles/openshift_storage_glusterfs/tasks/kernel_modules.yml and https://github.com/mjudeikis/openshift-ansible/blob/7d67afffe217e29c7bb8941343bd017907ffc9df/roles/openshift_storage_glusterfs/templates/glusterfs.conf.j2
The format is just the names of modules, one per line.
Okay so I got the modules loaded automatically on node start up and checked to make sure that they were running before I started the script.
I then ran the gk-deploy script knowing what the setup-openshift-heketi-storage was referring to.
This is where I am at now: Gluster containers come up Heketi comes up Heketi connects gluster pods to the untouched volumes (WOOOO) It then gets through the setup-openshift-heketi-storage
Now I am getting an error pull on the heketi storage copy job. Where does the command to pull that image live? I am in an offline environment so I have to manually move containers from the open internet to my dev environment.
Its trying to pull heketi/heketi:dev? But for all of my yaml files I have it set to pull a different named image in my closed environment. Lost.
@mjschmidt Yes, the setup-openshift-heketi-storage
command by default will always create a YAML manifest for the copy job that specifies heketi/heketi:dev
as its image. You can change that by using the --image
option into the command. So you have three options:
--image <image_name>
here: https://github.com/gluster/gluster-kubernetes/blob/master/deploy/gk-deploy#L859heketi/heketi:dev
imageHmmmm okay, I am going to try that and if it works tomorrow. Thank you.
@jarrpa @nixpanic WHOOP
So good news after the hard code into the open-shift command (a change I very much want to open a new issue for/help out with when this is all said and done) and countless cluster deletes and re-provisions, I was able to have the gk-deploy script run successfully.
NOW how do I use this storage?
Upon completion the script says "heketi is now running and accessible via http://
etc etc etc
I am trying to have the storage I just set up serve the kubernetes cluster it is currently running on. Is that where that yaml file comes into play? So Throw the following into a yaml file then kubectl apply -f whatever_I_call_this file.yaml:
"---"
apiversion: storage.k8s.io/v1beta
kind: StorageClass
metadata:
name: glusterfs-storage
provisioner: kubernetes.io/glusterfs
parameters:
resturl: "http:
By the way is this IP generated or statically written and I have to fill in my IP? It looks like the later but I want to be sure.
Never mind I just had to rename the volume mount point to the storage class I made. WOOOOOOO IT WORKS! 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉
I'll have to find some time to create a troubleshooting page this weekend for Kubernetes deployment
So can I do the heketi deployment and scale that out if I need to and have requests come into the hekeiti in say a round robin fashion? Or would that start to break things?
Not to spam the chat, but how do I expand out to more nodes?
WOO!! Great to hear! :D
All right, so, at this point you don't want to scale the current heketi deployment. heketi expects exclusive access to its metadata DB. But there are several ways you can expand this setup:
heketi-cli node add
with the node info, then add the devices on that node.clusterID
in the node add
command. By default heketi will just use that additional cluster in its storage allocation algorithm, but you can now also specify one or more clusterIDs in your StorageClasses to direct volume requests to specific clusters.So adding gluster nodes to the deamon set is a manual process? I know if I label a node with the gluster tag it will spin up another gluster pod, but I guess there isn't anything that will trigger heketi to work with that new gluster pod?
Hi I am running kubernetes 1.8.5 on centos7.
I tried the script, but I get the glusterd not running. I seem to be having issues enabling the service, I checked some of the other issues on here, noticed that some other people were having similar problems so I went to abort my gk-deploy and the gluster pods did not delete.
Question 1 Is that an issue or can I keep trying to now re run the gk-deploy script even though my gluster pods are still up? If it is an issue how can I delete my gluster deamonset without causing and possible kubernetes caching problems?
Question 2 Why is it complaining about glusterd service not being started? Is that why I read that you needed to get rid of everything yum installed except for the gluster client?
Question 3 @jarrpa halp me. I see you answer a lot of these questions, getting exact error logs is tough because I am in an offline environment, but I will transcribe the important looking parts