gluster / gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes
Apache License 2.0
875 stars 389 forks source link

Pod devices for topology get stuck, if pods are restarted. #642

Open andresolotowas opened 4 years ago

andresolotowas commented 4 years ago

Hey guys, faced the following issue, would appreciate your help!

A few days ago I was doing a GlusterFS deploy into my cluster, so following the instructions Ive created loop devices like this

mkdir home/gluster; dd if=/dev/zero of=/home/gluster/image bs=1M count=10000; losetup /dev/loop0 /home/gluster/image

Given this I was able to load the topology later with /dev/loop0 devices successfully. Ive removed Heketi deploy, Gluster pods and whatever else created and started again from scratch. Now I try to repeat all the way done before, I what I now see is - on new Gluster pods, there are no longer "/home/gluster/image" files. This is OK, since that pods were removed, but it was still left on the nodes, as I understand. So now, in the new Gluster pods I still see the "/dev/loop0" that is explained as

[root@p-4v84 /]# losetup -a /dev/loop0: [0203]:2235485 (/home/gluster/image)

so referring a missing file, actually, from that old pod I've removed, and what I see else is: [root@p-4v84 /]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vdb 254:16 0 458K 1 disk loop0 7:0 0 9.8G 0 loop ├─vg_f581b805ccd1f865b901e290ebaba0ff-tp_edd1f9ff0dc51aa426c6d0e399c011bc_tdata 253:1 0 2G 0 lvm │ └─vg_f581b805ccd1f865b901e290ebaba0ff-tp_edd1f9ff0dc51aa426c6d0e399c011bc-tpool 253:2 0 2G 0 lvm │ ├─vg_f581b805ccd1f865b901e290ebaba0ff-brick_edd1f9ff0dc51aa426c6d0e399c011bc 253:4 0 2G 0 lvm /var/lib/heketi/mounts/vg_f581b805ccd1f865b901e290ebaba0ff/brick_edd1f9ff0dc51aa426c6d0e399c011bc │ └─vg_f581b805ccd1f865b901e290ebaba0ff-tp_edd1f9ff0dc51aa426c6d0e399c011bc 253:3 0 2G 0 lvm └─vg_f581b805ccd1f865b901e290ebaba0ff-tp_edd1f9ff0dc51aa426c6d0e399c011bc_tmeta 253:0 0 12M 0 lvm └─vg_f581b805ccd1f865b901e290ebaba0ff-tp_edd1f9ff0dc51aa426c6d0e399c011bc-tpool 253:2 0 2G 0 lvm ├─vg_f581b805ccd1f865b901e290ebaba0ff-brick_edd1f9ff0dc51aa426c6d0e399c011bc 253:4 0 2G 0 lvm /var/lib/heketi/mounts/vg_f581b805ccd1f865b901e290ebaba0ff/brick_edd1f9ff0dc51aa426c6d0e399c011bc └─vg_f581b805ccd1f865b901e290ebaba0ff-tp_edd1f9ff0dc51aa426c6d0e399c011bc 253:3 0 2G 0 lvm vda 254:0 0 50G 0 disk └─vda1 254:1 0 50G 0 part /var/lib/glusterd

As I try to understand - this is a mount from the node itself, or something like this. And the issue is - I cannot either use this /dev/loop0 device again or I cannot umount it, losetup -d, or rm, or wipefs -a or whatever else I've already tried. No results. This is what I get from Heketi

[root@deploy-heketi-6c687b4b84-llrg9 /]# heketi-cli device add --name=/dev/vda/loop0 --node=3d7bea391ff3e6565f70c8053e8de8b3\ Error: Initializing device /dev/vda/loop0 failed (already initialized or contains data?): WARNING: Failed to connect to lvmetad. Falling back to device scanning. Device /dev/vda/loop0 not found.

Mostly as same as I get locally on the pod

[root@p-4v84 /]# pvscan WARNING: Failed to connect to lvmetad. Falling back to device scanning. No matching physical volumes found

And here is what topology load feeds me back:

[root@deploy-heketi-6c687b4b84-llrg9 tmp]# heketi-cli topology load --json=topology.json Found node p-4v84 on cluster b455b24ff6598025c7a04f8b1ec8e4cc Adding device /dev/loop0 ... Unable to add device: Initializing device /dev/loop0 failed (already initialized or contains data?): WARNING: Failed to connect to lvmetad. Falling back to device scanning. Can't open /dev/loop0 exclusively. Mounted filesystem? Can't open /dev/loop0 exclusively. Mounted filesystem? Found node p-4v8h on cluster b455b24ff6598025c7a04f8b1ec8e4cc Adding device /dev/loop0 ... Unable to add device: Initializing device /dev/loop0 failed (already initialized or contains data?): WARNING: Failed to connect to lvmetad. Falling back to device scanning. Can't open /dev/loop0 exclusively. Mounted filesystem? Can't open /dev/loop0 exclusively. Mounted filesystem? Found node p-4v8i on cluster b455b24ff6598025c7a04f8b1ec8e4cc Adding device /dev/loop0 ... Unable to add device: Initializing device /dev/loop0 failed (already initialized or contains data?): WARNING: Failed to connect to lvmetad. Falling back to device scanning. Can't open /dev/loop0 exclusively. Mounted filesystem? Can't open /dev/loop0 exclusively. Mounted filesystem?

Do you have any clues how can I maybe reuse this device? Or create a new one maybe (but how to be with a disk space this case if I leave this still alive)?

Would be really glad and appreciate any ideas around this.

PS. And one more thing unclear - would be nice to get it clarified - why does the GlusterFS pod come without a device ready to go into topology? I dont see it from the box, I dont see it in "gk-deploy" script. Why do I have to create it by hands?

Thank you in advance!