gluster / gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes
Apache License 2.0
875 stars 390 forks source link

Disaster recovery, backup and restore #528

Closed Webgardener closed 6 years ago

Webgardener commented 6 years ago

Hi,

After loosing one (or more) physical gluster node, I tried to repair the cluster with following steps:

  1. removing failed node with heketi-cli;
  2. removing failed node with kubectl;
  3. instanciate new physical gluster node and install glusterfs, add node to K8s cluster, generate new heketi topology file and execute gk-deploy ;
  4. rebalance the data among the servers.

I am stuck at step one since it is impossible to delete a node - with heketi-cli - that contains devices. And impossible to delete devices that contain bricks.

What should I do ?

phlogistonjohn commented 6 years ago

Using heketi-cli you must first "remove" the node or devices. For example, run heketi-cli node disable NODEID followed byheketi-cli node remove NODEID. This will remove all the bricks from the given node. Once the devices are emptied you can then use heketi-cli device delete and heketi-cli node delete to fully remove the items. Note that in order to remove the node there must be sufficent space on the other node's devices. If there is not, you need to add a new good node before running node remove.

As for the rest of your steps, I don't fully understand the need to generate a new topology file and re-run gluster kubernetes. I would simply add nodes and devices with heketi-cli (use node add and device add respectively).

Webgardener commented 6 years ago

Thank you for your answer. I had already tried to remove the node but it did not work, the bricks cannot be removed. There are plenty of space on the other nodes' devices though. There is something I must be missing.

About the rest of the steps, the cluster is composed of three nodes. When a node is lost physically, I would like to replace it with another node (same hostname, different IP).

phlogistonjohn commented 6 years ago

Thank you for your answer. I had already tried to remove the node but it did not work, the bricks cannot be removed. There are plenty of space on the other nodes' devices though. There is something I must be missing.

In that case you might just be hitting a bug or some other subtle issue rather than a architectureal problem. Please feel free to post the exact error message you got and any relevant logging from heketi and we will see what we can do about helping you debug this issue.

About the rest of the steps, the cluster is composed of three nodes. When a node is lost physically, I would like to replace it with another node (same hostname, different IP).

OK. In that case I will reiterate that IMO the right approach is to use the heketi-cli to add the new replacement node and devices. Don't bother updating the topology... its only real use is loading the initial cluster. To get the glusterfs pod running on the node simply add the appropriate labels to the node and the daemonset should do the rest.

Webgardener commented 6 years ago

Ok. Here are some config info and error message.

Before a node crashes:

$ openstack server list | grep gluster
gluster-1         | ACTIVE | EXTENDED=172.50.0.194; IP=192.168.0.221
gluster-2         | ACTIVE | EXTEND=172.50.0.187;  IP=192.168.0.225
gluster-3         | ACTIVE | EXTENDED=172.50.0.195; IP=192.168.0.222

$ openstack volume list | grep gluster
gluster-1-volume-docker          | in use    |   20 | Attached to gluster-2 on /dev/sdc
gluster-1-volume-glusterfs       | in-use    |   30 | Attached to gluster-2 on /dev/sdb                                              
gluster-2-volume-glusterfs       | in-use    |   30 | Attached to gluster-2 on /dev/sdb          
gluster-2-volume-docker          | in-use    |   20 | Attached to gluster-2 on /dev/sdc          
gluster-3-volume-glusterfs       | in-use    |   30 | Attached to gluster-3 on /dev/sdb          
gluster-3-volume-docker          | in-use    |   20 | Attached to gluster-3 on /dev/sdc          

Storage Class:

apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1
  kind: StorageClass
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"storage-test-gluster"},"parameters":{"resturl":"http://172.20.32.30:8080","volumetype":"replicate:3"},"provisioner":"kubernetes.io/glusterfs"}
    creationTimestamp: 2018-10-19T12:42:27Z
    name: storage-test-gluster
    resourceVersion: "1563397"
    selfLink: /apis/storage.k8s.io/v1/storageclasses/storage-test-gluster
    uid: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  parameters:
    resturl: http://172.20.32.30:8080
    volumetype: replicate:3
  provisioner: kubernetes.io/glusterfs
  reclaimPolicy: Delete
  volumeBindingMode: Immediate
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Info about K8s cluster state:


$ kubectl -n kube-system get no,ds,po 

node/gluster-1   Ready,SchedulingDisabled  
node/gluster-2   Ready,SchedulingDisabled  
node/gluster-3   Ready,SchedulingDisabled  

daemonset.extensions/glusterfs              3         3         3       3            3           storagenode=glusterfs      19m

pod/glusterfs-7dtw6                            1/1     Running
pod/glusterfs-hdh6z                            1/1     Running
pod/glusterfs-s2mj4                            1/1     Running
pod/heketi-5bd5d976b6-9tcqb                    1/1     Running

$ heketi-cli cluster list
Clusters:
Id:fd2c0e1c86a459f5498e489bd226fe5d [file][block]

$ heketi-cli cluster info fd2c0e1c86a459f5498e489bd226fe5d
Nodes:
0c0be6bac1705c0685dd016cb843f261
5a1b616651842fccb7d8a30e10c51423
6b970fb24c73d9eea3bbef4c0dfa9a21
Volumes:
18c5483a18baa26b248e68ba07c248aa
e46ba70f01f01678cc7a576ccf14281a
Block: true

$ heketi-cli node info 0c0be6bac1705c0685dd016cb843f261
Node Id: 0c0be6bac1705c0685dd016cb843f261
State: online
Cluster Id: fd2c0e1c86a459f5498e489bd226fe5d
Zone: 2
Management Hostname: gluster-1
Storage Hostname: 172.50.0.194
Devices:
Id:d68c6b1a6a128c35251d214501a8e84d   Name:/dev/cinder/gluster State:online    Size (GiB):29      Used (GiB):3       Free (GiB):26      Bricks:2

$ heketi-cli volume list
Id:18c5483a18baa26b248e68ba07c248aa    Cluster:fd2c0e1c86a459f5498e489bd226fe5d    Name:heketidbstorage
Id:e46ba70f01f01678cc7a576ccf14281a    Cluster:fd2c0e1c86a459f5498e489bd226fe5d    Name:vol_e46ba70f01f01678cc7a576ccf14281a

$ heketi-cli volume info e46ba70f01f01678cc7a576ccf14281a
Name: vol_e46ba70f01f01678cc7a576ccf14281a
Size: 1
Volume Id: e46ba70f01f01678cc7a576ccf14281a
Cluster Id: fd2c0e1c86a459f5498e489bd226fe5d
Mount: 172.50.0.194:vol_e46ba70f01f01678cc7a576ccf14281a
Mount Options: backup-volfile-servers=172.50.0.187,172.50.0.195
Block: false
Free Size: 0
Reserved Size: 0
Block Hosting Restriction: (none)
Block Volumes: []
Durability Type: replicate
Distributed+Replica: 3

Now, a disaster occurs, server gluster 1 and its attached volumes are gone for good.


$ openstack server list | grep gluster
gluster-2         | ACTIVE | EXTEND=172.50.0.187;  IP=192.168.0.225
gluster-3         | ACTIVE | EXTENDED=172.50.0.195; IP=192.168.0.222

$ openstack volume list | grep gluster
gluster-2-volume-glusterfs       | in-use    |   30 | Attached to gluster-2 on /dev/sdb          
gluster-2-volume-docker          | in-use    |   20 | Attached to gluster-2 on /dev/sdc          
gluster-3-volume-glusterfs       | in-use    |   30 | Attached to gluster-3 on /dev/sdb          
gluster-3-volume-docker          | in-use    |   20 | Attached to gluster-3 on /dev/sdc  

$ kubectl -n kube-system get no,ds,po | grep gluster

node/gluster-1   NotReady,SchedulingDisabled   
node/gluster-2   Ready,SchedulingDisabled      
node/gluster-3   Ready,SchedulingDisabled     

daemonset.extensions/glusterfs              3         3         2       3            2           storagenode=glusterfs 

pod/glusterfs-7dtw6                            1/1     Running    
pod/glusterfs-hdh6z                            1/1     Running    
pod/glusterfs-s2mj4                            1/1     NodeLost   

Heketi still thinks dead node is online :


$ heketi-cli node info 0c0be6bac1705c0685dd016cb843f261
Node Id: 0c0be6bac1705c0685dd016cb843f261
State: online
Cluster Id: fd2c0e1c86a459f5498e489bd226fe5d
Zone: 2
Management Hostname: gluster-1
Storage Hostname: 172.50.0.194
Devices:
Id:d68c6b1a6a128c35251d214501a8e84d   Name:/dev/cinder/gluster State:online    Size (GiB):29      Used (GiB):3       Free (GiB):26      Bricks:2

So now I want to repair the cluster by, first, removing the dead node with heketi-cli:

$ heketi-cli node disable 0c0be6bac1705c0685dd016cb843f261
Node 0c0be6bac1705c0685dd016cb843f261 is now offline

$ heketi-cli node remove 0c0be6bac1705c0685dd016cb843f261
**Error: Failed to remove device, error: No Replacement was found for resource requested to be removed
command terminated with exit code 255**

$ heketi-cli topology info

    File:  true
    Block: true

    Volumes:

        Name: heketidbstorage
        Size: 2
        Id: 18c5483a18baa26b248e68ba07c248aa
        Cluster Id: fd2c0e1c86a459f5498e489bd226fe5d
        Mount: 172.50.0.194:heketidbstorage
        Mount Options: backup-volfile-servers=172.50.0.187,172.50.0.195
        Durability Type: replicate
        Replica: 3
        Snapshot: Disabled

                Bricks:
                        Id: 62245415b93b84072d78e28b77bc0a05
                        Path: /var/lib/heketi/mounts/vg_d68c6b1a6a128c35251d214501a8e84d/brick_62245415b93b84072d78e28b77bc0a05/brick
                        Size (GiB): 2
                        Node: 0c0be6bac1705c0685dd016cb843f261
                        Device: d68c6b1a6a128c35251d214501a8e84d

                        Id: 9bf29464f080b2b06c0e31cf86037d53
                        Path: /var/lib/heketi/mounts/vg_c3ea2fc610caed5d8372f4930ede201f/brick_9bf29464f080b2b06c0e31cf86037d53/brick
                        Size (GiB): 2
                        Node: 6b970fb24c73d9eea3bbef4c0dfa9a21
                        Device: c3ea2fc610caed5d8372f4930ede201f

                        Id: ffc14c4ec54e696d52cd194f30438870
                        Path: /var/lib/heketi/mounts/vg_f8876f11a1a6e8043ea082339b6dd2df/brick_ffc14c4ec54e696d52cd194f30438870/brick
                        Size (GiB): 2
                        Node: 5a1b616651842fccb7d8a30e10c51423
                        Device: f8876f11a1a6e8043ea082339b6dd2df

        Name: vol_e46ba70f01f01678cc7a576ccf14281a
        Size: 1
        Id: e46ba70f01f01678cc7a576ccf14281a
        Cluster Id: fd2c0e1c86a459f5498e489bd226fe5d
        Mount: 172.50.0.194:vol_e46ba70f01f01678cc7a576ccf14281a
        Mount Options: backup-volfile-servers=172.50.0.187,172.50.0.195
        Durability Type: replicate
        Replica: 3
        Snapshot: Disabled

                Bricks:
                        Id: 6a311aef7b3d43256998f64df8c3fe9a
                        Path: /var/lib/heketi/mounts/vg_f8876f11a1a6e8043ea082339b6dd2df/brick_6a311aef7b3d43256998f64df8c3fe9a/brick
                        Size (GiB): 1
                        Node: 5a1b616651842fccb7d8a30e10c51423
                        Device: f8876f11a1a6e8043ea082339b6dd2df

                        Id: d572b10a7533bce138675fe664b50ebc
                        Path: /var/lib/heketi/mounts/vg_c3ea2fc610caed5d8372f4930ede201f/brick_d572b10a7533bce138675fe664b50ebc/brick
                        Size (GiB): 1
                        Node: 6b970fb24c73d9eea3bbef4c0dfa9a21
                        Device: c3ea2fc610caed5d8372f4930ede201f

                        Id: f3ceded3920a2666fb5c82e8e629dd27
                        Path: /var/lib/heketi/mounts/vg_d68c6b1a6a128c35251d214501a8e84d/brick_f3ceded3920a2666fb5c82e8e629dd27/brick
                        Size (GiB): 1
                        Node: 0c0be6bac1705c0685dd016cb843f261
                        Device: d68c6b1a6a128c35251d214501a8e84d

    Nodes:

        Node Id: 0c0be6bac1705c0685dd016cb843f261
        State: offline
        Cluster Id: fd2c0e1c86a459f5498e489bd226fe5d
        Zone: 2
        Management Hostnames: gluster-1
        Storage Hostnames: 172.50.0.194
        Devices:
                Id:d68c6b1a6a128c35251d214501a8e84d   Name:/dev/cinder/gluster State:online    Size (GiB):29      Used (GiB):3       Free (GiB):26
                        Bricks:
                                Id:62245415b93b84072d78e28b77bc0a05   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_d68c6b1a6a128c35251d214501a8e84d/brick_62245415b93b84072d78e28b77bc0a05/brick
                                Id:f3ceded3920a2666fb5c82e8e629dd27   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_d68c6b1a6a128c35251d214501a8e84d/brick_f3ceded3920a2666fb5c82e8e629dd27/brick

        Node Id: 5a1b616651842fccb7d8a30e10c51423
        State: online
        Cluster Id: fd2c0e1c86a459f5498e489bd226fe5d
        Zone: 1
        Management Hostnames: gluster-2
        Storage Hostnames: 172.50.0.187
        Devices:
                Id:f8876f11a1a6e8043ea082339b6dd2df   Name:/dev/cinder/gluster State:online    Size (GiB):29      Used (GiB):3       Free (GiB):26
                        Bricks:
                                Id:6a311aef7b3d43256998f64df8c3fe9a   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_f8876f11a1a6e8043ea082339b6dd2df/brick_6a311aef7b3d43256998f64df8c3fe9a/brick
                                Id:ffc14c4ec54e696d52cd194f30438870   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_f8876f11a1a6e8043ea082339b6dd2df/brick_ffc14c4ec54e696d52cd194f30438870/brick

        Node Id: 6b970fb24c73d9eea3bbef4c0dfa9a21
        State: online
        Cluster Id: fd2c0e1c86a459f5498e489bd226fe5d
        Zone: 2
        Management Hostnames: gluster-3
        Storage Hostnames: 172.50.0.195
        Devices:
                Id:c3ea2fc610caed5d8372f4930ede201f   Name:/dev/cinder/gluster State:online    Size (GiB):29      Used (GiB):3       Free (GiB):26
                        Bricks:
                                Id:9bf29464f080b2b06c0e31cf86037d53   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_c3ea2fc610caed5d8372f4930ede201f/brick_9bf29464f080b2b06c0e31cf86037d53/brick
                                Id:d572b10a7533bce138675fe664b50ebc   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_c3ea2fc610caed5d8372f4930ede201f/brick_d572b10a7533bce138675fe664b50ebc/brick
Webgardener commented 6 years ago

I tried the same scenario with 4 nodes and a replica 3 and it worked:

heketi-cli node remove b036c5b98773e0309a8ce94925b46228
Node b036c5b98773e0309a8ce94925b46228 is now removed

But still, this is not a viable solution: a entire zone can also disapear (with 2 nodes on it)... is there a workaround?

Webgardener commented 6 years ago

I still struggle to add a new node.

In case of replica set to 3 and 4 nodes available, deletion of dead node is now OK:

$ heketi-cli node disable b036c5b98773e0309a8ce94925b46228
Node b036c5b98773e0309a8ce94925b46228 is now disabled

$ heketi-cli node remove b036c5b98773e0309a8ce94925b46228
Node b036c5b98773e0309a8ce94925b46228 is now removed

$ heketi-cli device delete 1706deeec069c88fa959fbded7ade4ca
Device 1706deeec069c88fa959fbded7ade4ca deleted

$ heketi-cli node delete b036c5b98773e0309a8ce94925b46228
Node b036c5b98773e0309a8ce94925b46228 deleted

$ heketi-cli node list
Id:247159a12305c992eb0b042a23344dce     Cluster:96f3890df2950b405a8dcf9b7c064ed9
Id:d4d46f2d1eb20bc64f5361bd1acd8b4e     Cluster:96f3890df2950b405a8dcf9b7c064ed9
Id:e67edfce50189dfac1968ad64608b2c1     Cluster:96f3890df2950b405a8dcf9b7c064ed9

Since replica was set to 3, the Gluster cluster still works properly. But I cannot loose another node! So I need to add a worker node to my GlusterFS cluster asap.

The strategy is to add a new host (named gluster-2 like the previous one), and configure it so that it can join the GlusterFS cluster .

But as soon as I add the new host with same hostname as the dead one (gluster-2) - before any further configuration - some of the gluster pods enter in CrashLoopBackOff.

pod/glusterfs-4jzcw                            0/1     CrashLoopBackOff   0          2d
pod/glusterfs-hlg2c                            1/1     Running            1          2d
pod/glusterfs-kfb8k                            1/1     NodeLost           0          2d
pod/glusterfs-z2r5s                            0/1     CrashLoopBackOff   0          2d
phlogistonjohn commented 6 years ago

Error: Failed to remove device, error: No Replacement was found for resource requested to be removed In case of replica set to 3 and 4 nodes available, deletion of dead node is now OK:

RIght. So when I wrote, "Note that in order to remove the node there must be sufficent space on the other node's devices. If there is not, you need to add a new good node before running node remove." I should have expanded on that more. Replica 3 volumes require that each brick of the replica set be on 3 different nodes. If you only have a 3 node cluster you will not be able to replace any node in that cluster until you add in a replacement node first. A four node cluster is acting akin to a hot spare in this scenario.

The strategy is to add a new host (named gluster-2 like the previous one), and configure it so that it can join the GlusterFS cluster .

I don't think that will work. You need to name it something new like "gluster-5" with it's own unique IPs. gluster (and heketi) use the hosts networking and thus you can't reuse any existing identifiers until the old node is completely purged from the cluster first.

But still, this is not a viable solution: a entire zone can also disapear (with 2 nodes on it)... is there a workaround?

If you have more than one node in a single zone, then I can't think of one that uses the tools w/in heketi. Problem is even if you add >1 node per zone heketi's algorithm for placing bricks does not respect zone strongly enough to guarantee that it won't place two of our replica 3 bricks in the same zone (we're aware of this issue but it's difficult to fix directly). You'd have to have exactly three zones with one gluster node each. I'm ignoring any potential gluster solutions that are not supported by heketi, because I am primarily familiar with heketi code itself. That doesn't mean you shouldn't read up on possibilities like backup/georeplication/etc.

Webgardener commented 6 years ago

Thank you very much for your help. The issue can be closed.