Closed Collin-Moore closed 5 years ago
Update, now each node has a 40GB raw device named sdb
. Modified my topology to use that device and reinstalled. Now all four nodes have the Failed to initialize IB Device
message that I had above.
Two of the nodes when I ran lsblk
I would get the following output:
lsblk: dm-0: failed to get device path
lsblk: dm-1: failed to get device path
lsblk: dm-0: failed to get device path
lsblk: dm-1: failed to get device path
lsblk: dm-2: failed to get device path
lsblk: dm-3: failed to get device path
lsblk: dm-4: failed to get device path
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 60G 0 disk
└─sda1 8:1 0 60G 0 part /
sdb 8:16 0 40G 0 disk
sr0 11:0 1 1024M 0 rom
These two display the partitions after a reboot, not sure why.
One node did not have any partitions in the lsblk
output, and the other had normal lsblk
output like below:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 [2019-01-24 16:25:01.853316] W [MSGID: 106117] [glusterd-handler.c:6407:__glusterd_peer_rpc_not60G 0 disk
└─sda1 8:1 0 60G 0 part /
sdb 8:16 0 40G 0 disk
├─vg_523582f49c2350faec18aec9c70dbd7c-tp_f2d04a049a5145724b7b94bac0efd6c5_tmeta 253:0 0 12M 0 lvm
│ └─vg_523582f49c2350faec18aec9c70dbd7c-tp_f2d04a049a5145724b7b94bac0efd6c5-tpool 253:2 0 2G 0 lvm
│ ├─vg_523582f49c2350faec18aec9c70dbd7c-tp_f2d04a049a5145724b7b94bac0efd6c5 253:3 0 2G 0 lvm
│ └─vg_523582f49c2350faec18aec9c70dbd7c-brick_a517663655fb980df6c8ae55f0215a7f 253:4 0 2G 0 lvm
└─vg_523582f49c2350faec18aec9c70dbd7c-tp_f2d04a049a5145724b7b94bac0efd6c5_tdata 253:1 0 2G 0 lvm
└─vg_523582f49c2350faec18aec9c70dbd7c-tp_f2d04a049a5145724b7b94bac0efd6c5-tpool 253:2 0 2G 0 lvm
├─vg_523582f49c2350faec18aec9c70dbd7c-tp_f2d04a049a5145724b7b94bac0efd6c5 253:3 0 2G 0 lvm
└─vg_523582f49c2350faec18aec9c70dbd7c-brick_a517663655fb980df6c8ae55f0215a7f 253:4 0 2G 0 lvm
sr0 11:0 1 1024M 0 rom
In light of these updates any ideas what is going on?
It sounds like the /dev entries are not in-sync with what is available on the hosts. Can you make sure that you are using the most recent container images and that the daemonset for the glusterfs-server pods have a HOST_DEV_DIR bind-mount.
@nixpanic I was able to get the nodes re-imaged last Friday and I just reinstalled the cluster a few minutes ago. Getting the same error on all the nodes, though only one of them does not show the partitions in their lsblk
output. Below are the image version taken from kubectl of all the containers running in the cluster as well as part of my glusterfs-daemonset.yaml
. Do note I added the NoSchedule toleration so that we could run on the master nodes. Could this be causing problems?
Image versions
4 coredns/coredns:1.2.6
2 gcr.io/google_containers/cluster-proportional-autoscaler-amd64:1.3.0
4 gcr.io/google-containers/kube-apiserver:v1.12.3
4 gcr.io/google-containers/kube-controller-manager:v1.12.3
8 gcr.io/google-containers/kube-proxy:v1.12.3
2 gcr.io/google_containers/kubernetes-dashboard-amd64:v1.10.0
4 gcr.io/google-containers/kube-scheduler:v1.12.3
8 gluster/gluster-centos:latest
2 heketi/heketi:dev
4 nginx:1.13
2 quay.io/calico/kube-controllers:v3.1.3
8 quay.io/calico/node:v3.1.3
glusterfs-daemonset.yaml
spec:
template:
metadata:
name: glusterfs
labels:
glusterfs: pod
glusterfs-node: pod
spec:
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
nodeSelector:
storagenode: glusterfs
hostNetwork: true
containers:
- image: gluster/gluster-centos:latest
imagePullPolicy: IfNotPresent
name: glusterfs
env:
# alternative for /dev volumeMount to enable access to *all* devices
- name: HOST_DEV_DIR
value: "/mnt/host-dev"
# set GLUSTER_BLOCKD_STATUS_PROBE_ENABLE to "1" so the
# readiness/liveness probe validate gluster-blockd as well
- name: GLUSTER_BLOCKD_STATUS_PROBE_ENABLE
value: "1"
This all looks pretty good to me. The Failed to initialize IB Device
error should not be an issue, rdma
is likely not available/configured, and tcp
should just work.
Heketi creates the LVM structures on the disks when these are added through the tolopogy.json
or with heketi-cli device add ..
. Devices need to be empty when adding, otherwise heketi will not do so (overwriting existing data is not nice).
You probably should inspect which devices heketi has configured. If some devices are missing, you might be able to find in the logs why adding the devices failed.
@nixpanic thanks for explaining the significance of those messages, it sounds like I may have been worried about the wrong thing. I took a look at things using the heketi-cli
and got the output from each node, which all seem fine, just the one that didn't seem to create any volumes.
Id:5c93098c942855162753f34d8ba3afc9 Name:/dev/sdb State:online Size (GiB):39 Used (GiB):2 Free (GiB):37 Bricks:1
Id:80ff81efa6991f2e76020b6addddc8d7 Name:/dev/sdb State:online Size (GiB):39 Used (GiB):2 Free (GiB):37 Bricks:1
Id:4eac023880d4fca9aead6e2be451da57 Name:/dev/sdb State:online Size (GiB):39 Used (GiB):2 Free (GiB):37 Bricks:1
Id:44851475ce9fd52e14d65b7b2d556a69 Name:/dev/sdb State:online Size (GiB):39 Used (GiB):0 Free (GiB):39 Bricks:0
However, I checked the topology info and noticed that the volume has this field Replica: 3
.
This would explain why heketi has not created any volumes on one of my nodes. Is this something to worry about? I would like to have my volumes available on all 4 nodes. Or does their need to be an odd number so there is a clear majority if a network partition occurs?
Cluster Id: fb5a90fd1633816fca4038088912801b
File: true
Block: true
Volumes:
Name: heketidbstorage
Size: 2
Id: b67b85c3b57642a101d87683969070e0
Cluster Id: fb5a90fd1633816fca4038088912801b
Mount: 137.112.89.104:heketidbstorage
Mount Options: backup-volfile-servers=137.112.89.103,137.112.89.106,137.112.89.105
Durability Type: replicate
Replica: 3
Snapshot: Disabled
Bricks:
Id: 16231ad118a28d5361d16ca9daf1a66c
Path: /var/lib/heketi/mounts/vg_80ff81efa6991f2e76020b6addddc8d7/brick_16231ad118a28d5361d16ca9daf1a66c/brick
Size (GiB): 2
Node: 96551d24e0140b240d0c2ce6160e6230
Device: 80ff81efa6991f2e76020b6addddc8d7
Id: 89e5dd96f6bc4deae1e2f273f8baa918
Path: /var/lib/heketi/mounts/vg_4eac023880d4fca9aead6e2be451da57/brick_89e5dd96f6bc4deae1e2f273f8baa918/brick
Size (GiB): 2
Node: 362b33b567dd07f14bf6e16bb88e694b
Device: 4eac023880d4fca9aead6e2be451da57
Id: fb21d50cd95fb83e3db2f5460ef3628e
Path: /var/lib/heketi/mounts/vg_5c93098c942855162753f34d8ba3afc9/brick_fb21d50cd95fb83e3db2f5460ef3628e/brick
Size (GiB): 2
Node: 96ebc21c66d75c29fcf2ae02a865ffec
Device: 5c93098c942855162753f34d8ba3afc9
The recommendation is to have "replica 3" for volumes. That means the data of the volumes will be replicated on three nodes. Not all volumes will be on the same nodes, the nodes used per volume can differ.
The advantage to have four nodes, is that even when a single node is unavailable, everything will continue to work. It will stay possible to create new volume with replica-3. And, of course when one node is offline, two others will still have the data and the contents of the volumes can still be used.
HTH, Niels
@nixpanic Thanks for all the help. Everything appears to be working after the re-image. Heketi gave me a persistent volume for mongodb just like its supposed to.
So I'm trying to setup glusterfs within our 4 node kubernetes cluster (2 master and two workers). My team is trying to automate the deployment and teardown of our cluster, so we are at the step of automating glusterfs setup and teardown. I had a working cluster that could dynamically provision volumes through hetketi before this, but we need to be able to do this over and over so I attempted to clean the vm's of glusterfs and rebuild. In our case this involves recreating our block device as well.
(As an aside our team does not have full control of our infrastructure, which is why we want an alternative to reimaging the vm's or restoring from a snapshot since we cannot start that process whenever we want. This is for an academic project so keep that in mind as you read, I have a feeling these aren't best practices.)
Before I get into details here is environment info:
I also modified the
glusterfs-daemonset.yaml
file to add the NoSchedule toleration to the PodSpec, that way we could run glusterfs on the master nodes (not a great idea but its our current strategy)The current problem I am running into is after running
gk-deploy
to completion and all the pods are running, I check the block devices usinglsblk
and see that all but one of the two workers (it flips each time I rebuild betweenworker1
andworker2
) look like this:Whereas one of my worker nodes I will get this output from
lsblk
:Below is the output for
vgdisplay
on the good nodes:This is what the bad node looks like:
The issue appears to be that glusterd cannot mount my block device. Here are the first couple glusterd logs on the problem node:
So the next thing I should probably explain is what my clean up process looks like, because there may be something I'm leaving out.
gk-deploy --abort
These seem to get rid of all the files, volume groups, and persistent volumes
glusterfs.conf
file I created in the/etc/modules-load.d/
directory to loaddm_thin_pool
,dm_mirror
, anddm_snapshot
at reboot.loop0
device usingsudo losetup -d /dev/loop0
glusterfs.img
file we were using as our "block device" (yes this is hacky, we'll try to get some virtual block devices soon)Then I disable and remove the
loop0.service
which looks like:My main questions are as follows:
loop0
device is causing this and I should just stop and get some virtual block devices?In the meantime I'll keep trying to debug this, any help is much appreciated and I will do my best to quickly reply and provide additional information as needed.