gluster / gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes
Apache License 2.0
875 stars 390 forks source link

Heteki no space error #505

Open bend opened 6 years ago

bend commented 6 years ago

I'm trying to setup a glusterfs cluster with kubernetes. I managed to start the glusterd pods on all the nodes (3 nodes) I also managed to load the topology successfully, however when I run

heketi-cli setup-openshift-heketi-storage

I get the following error:

Error: No space

This is the output of

heketi-cli topology load --json=gluster-kubernetes/deploy/topology.json

        Found node vps01 on cluster 1a36667e4275773fc353f2caaaaaaa                                                                                       
                Adding device /dev/loop0 ... OK                                                                                                             
        Found node vps02 on cluster 1a36667e4275773fc353faaaaaaaa                                                                                       
                Found device /dev/loop0                                                                                                                     
        Found node vps04 on cluster 1a36667e4275773fc353faaaaaaa                                                                                      
                Adding device /dev/loop0 ... OK       

Output of

heketi-cli topology info 

Cluster Id: 1a36667e4275773fc353f2caaaaaa                                                                                                         

    File:  true                                                                                                                                             
    Block: true                                                                                                                                             

    Volumes:                                                                                                                                                

    Nodes:                                                                                                                                                  

        Node Id: 1752dcf447c8eb6eaad45aaaa                                                                                                         
        State: online                                                                                                                                       
        Cluster Id: 1a36667e4275773fc353f2caaa                                                                                                    
        Zone: 1                                                                                                                                             
        Management Hostnames: vps01                                                                                                                         
        Storage Hostnames: XX.XX.XX.219                                                                                                                    
        Devices:                                                                                                                                            
                Id:50396d72293c4723504810108bd75d41   Name:/dev/loop0          State:online    Size (GiB):12      Used (GiB):0       Free (GiB):12          
                        Bricks:                                                                                                                             

        Node Id: 56b8c1942b347a863ee73a005758cc27                                                                                                           
        State: online                                                                                                                                       
        Cluster Id: 1a36667e4275773fc353f2c8eb2dd2a3                                                                                                        
        Zone: 1                                                                                                                                             
        Management Hostnames: vps04                                                                                                                         
        Storage Hostnames: XX.XX.XX.227                                                                                                                     
        Devices:                                                                                                                                            
                Id:dc75ad8154234ebcf9174b018d0bc30a   Name:/dev/loop0          State:online    Size (GiB):9       Used (GiB):4       Free (GiB):5           
                        Bricks:                                                                                                                             

        Node Id: f82cb81a026884764d3d953c7c9b6a9f                                                                                                           
        State: online                                                                                                                                       
        Cluster Id: 1a36667e4275773fc353f2c8eb2dd2a3                                                                                                        
        Zone: 1                                                                                                                                             
        Management Hostnames: vps02                                                                                                                         
        Storage Hostnames: XX.XX.XX.157                                                                                                                     
        Devices:                                                                                                                                            
                Id:1914102b7ae395f12797981a0e3cf5a4   Name:/dev/loop0          State:online    Size (GiB):4       Used (GiB):4       Free (GiB):0           
                        Bricks:   

There is no more space on device 1914102b7ae395f12797981a0e3cf5a4, however I didn't not store anything yet on the device.

For info here is the topology.json file:

{
  "clusters": [
    {
      "nodes": [
        {
          "node": {
            "hostnames": {
              "manage": [
                "vps01"
              ],
              "storage": [
                "XX.XX.XX.219"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/loop0"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "vps02"
              ],
              "storage": [
                "XX.XX.XX.157"
              ]
            },
            "zone": 1                                                                                                                                       
          },                                                                                                                                                
          "devices": [                                                                                                                                      
            "/dev/loop0"                                                                                                                                    
          ]                                                                                                                                                 
        },                                                                                                                                                  
        {                                                                                                                                                   
          "node": {                                                                                                                                         
            "hostnames": {                                                                                                                                  
              "manage": [                                                                                                                                   
                "vps04"                                                                                                                                     
              ],                                                                                                                                            
              "storage": [                                                                                                                                  
                "XX.XX.XX.227"                                                                                                                              
              ]                                                                                                                                             
            },                                                                                                                                              
            "zone": 1                                                                                                                                       
          },                                                                                                                                                
          "devices": [                                                                                                                                      
            "/dev/loop0"                                                                                                                                    
          ]                                                                                                                                                 
        }                                                                                                                                                   
      ]                                                                                                                                                     
    }                                                                                                                                                       
  ]                                                                                                                                                         
}   
phlogistonjohn commented 6 years ago

The lack of space on 1914102b7ae395f12797981a0e3cf5a4 is almost certainly the cause of the out of space error you are seeing. Because the heketidbstorage that the command creates is replica 3 it needs to place a brick on that device and it can not due to the lack of free space.

I noticed the sizes of the devices are all different. Is this intentional? At any point did you run the heketi-cli device resync command ?

Another thing you can try is to run the heketi-cli db dump command and inspect the json output. If there are pending operations in the db it could mean that a volume that was only partially created is using space on the device.

Also, you could log into the gluster pods and use lvm commands like lvs to check and see if any storage for bricks was carved out of the device vg. (Note: each brick will map to two lvs - the primary lv and a thinpool lv, the primary lv is "inside" the thinpool)

bend commented 6 years ago

@phlogistonjohn I agree with the reason but I don't understand why it is full. I've created a loop device which is empty, so how come it is marked as full by heketi ?

bend commented 6 years ago

So I've delete all the volumes and all pods, svc etc.. I've recreated new loop device with 5GB size I've run

./gk-deploy -n gluster -w 900 -g -y                                                                                                                                                                       

Using Kubernetes CLI.
Using namespace "gluster".
Checking for pre-existing resources...
  GlusterFS pods ... not found.
  deploy-heketi pod ... not found.
  heketi pod ... not found.
  gluster-s3 pod ... not found.
Creating initial resources ... serviceaccount/heketi-service-account created
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view created
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view labeled
OK
node/vps01 labeled
node/vps02 labeled
node/vps04 labeled
daemonset.extensions/glusterfs created
Waiting for GlusterFS pods to start ... OK
Error from server (AlreadyExists): secrets "heketi-config-secret" already exists
secret/heketi-config-secret not labeled
service/deploy-heketi created
deployment.extensions/deploy-heketi created
Waiting for deploy-heketi pod to start ... OK
Creating cluster ... ID: cb42bacc3e5c68aaa07d143840a8f64c
Allowing file volumes on cluster.
Allowing block volumes on cluster.
Creating node vps01 ... ID: bf65f800524682813a5b125c319957cd
Adding device /dev/loop0 ... OK
Creating node vps02 ... ID: b6ea2328e2dce54f43e8a9f8ccabbde3
Adding device /dev/loop0 ... OK
Creating node vps04 ... ID: 0b4f3556c2139a98d3383704de072573
Adding device /dev/loop0 ... OK
heketi topology loaded.
Error: No space
command terminated with exit code 255
Failed on setup openshift heketi storage
This may indicate that the storage must be wiped and the GlusterFS nodes must be reset.

And this is the output of

heketi-cli topology info

Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c

    File:  true
    Block: true

    Volumes:

    Nodes:

        Node Id: 0b4f3556c2139a98d3383704de072573
        State: online
        Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
        Zone: 1
        Management Hostnames: vps04
        Storage Hostnames: XXX.XXX.XXX.227
        Devices:
                Id:a5c7f5ebc4c58c5e84279f195ac1a352   Name:/dev/loop0          State:online    Size (GiB):4       Used (GiB):4       Free (GiB):0
                        Bricks:

        Node Id: b6ea2328e2dce54f43e8a9f8ccabbde3
        State: online
        Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
        Zone: 1
        Management Hostnames: vps02
        Storage Hostnames: XXX.XXX.XXX.157
        Devices:
                Id:669c53412bc14502ebef9f30dda6c64c   Name:/dev/loop0          State:online    Size (GiB):4       Used (GiB):4       Free (GiB):0
                        Bricks:

        Node Id: bf65f800524682813a5b125c319957cd
        State: online
        Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
        Zone: 1
        Management Hostnames: vps01
        Storage Hostnames: XXX.XXX.XXX.219
        Devices:
                Id:ab5c466e880855b1bc94a5a90e05f6cb   Name:/dev/loop0          State:online    Size (GiB):4       Used (GiB):0       Free (GiB):4
                        Bricks:
bend commented 6 years ago

I've also tried to run heketi-cli device resync for all the devices, after that the heketi topology info shows that all the devices are free. I've then rerun (without deleting anything) ./gk-deploy -n gluster -w 900 -g -y

And I still get the same error... Is there a minimum required size ?

Thanks

phlogistonjohn commented 6 years ago

Yes, the minimum size is 2Gi.

Without an up-to-date topology info & heketi logs or a db dump I'm afraid there's not much more I can do. Please note that device resync can help in some situations but it also has bugs and I've seen it shrink the volume size (incorrectly). I don't recommend running it except as a last resort.

bend commented 6 years ago

Here is the topology info:


Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c

    File:  true
    Block: true

    Volumes:

    Nodes:

        Node Id: 0b4f3556c2139a98d3383704de072573
        State: online
        Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
        Zone: 1
        Management Hostnames: vps04
        Storage Hostnames: 51.68.47.227
        Devices:
                Id:a5c7f5ebc4c58c5e84279f195ac1a352   Name:/dev/loop0          State:online    Size (GiB):4       Used (GiB):4       Free (GiB):0       
                        Bricks:

        Node Id: b6ea2328e2dce54f43e8a9f8ccabbde3
        State: online
        Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
        Zone: 1
        Management Hostnames: vps02
        Storage Hostnames: 5.196.23.157
        Devices:
                Id:669c53412bc14502ebef9f30dda6c64c   Name:/dev/loop0          State:online    Size (GiB):4       Used (GiB):4       Free (GiB):0       
                        Bricks:

        Node Id: bf65f800524682813a5b125c319957cd
        State: online
        Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
        Zone: 1
        Management Hostnames: vps01
        Storage Hostnames: 51.68.225.219
        Devices:
                Id:ab5c466e880855b1bc94a5a90e05f6cb   Name:/dev/loop0          State:online    Size (GiB):4       Used (GiB):0       Free (GiB):4       
                        Bricks:

I can't find any logs in the container (nothing with kubectl logs nor journalctl and nothing in /var/log). Where can I find the heketi logs for the container ?

phlogistonjohn commented 6 years ago

Getting nothing from the kubectl logs command sounds weird. Typically there will be some logging generated by the server when you make requests to it. Heketi logs to stdio so either systemd or the container system will be capturing the logging.

This topology output shows free space of 0 for devices 669c53412bc14502ebef9f30dda6c64c and a5c7f5ebc4c58c5e84279f195ac1a352, so I see why you are getting the no space error again.

If you log on to the gluster pods (via kubectl exec for example) what do the lvs and vgs commands show?

bend commented 6 years ago

If I log to the glusterfs pods, here is the output:

[root@vps01 /]# lvs                                                                                                                                         
[root@vps01 /]# vgs                                                                                                                                         
  VG                                  #PV #LV #SN Attr   VSize  VFree                                                                                       
  vg_01c148fa8b180ce37e64e42354e93732   1   0   0 wz--n- <4.88g <4.88g   
[root@vps02 /]# lvs
[root@vps02 /]# vgs                                                                                                                                         
  VG                                  #PV #LV #SN Attr   VSize  VFree                                                                                       
  vg_c7bc3aef090bfe32076c8634020330cf   1   0   0 wz--n- <4.88g <4.88g
[root@vps04 /]# lvs
[root@vps04 /]# vgs                                                                                                                                         
  VG                                  #PV #LV #SN Attr   VSize  VFree                                                                                       
  vg_72e918263ddbdb987c5c19943433d823   1   0   0 wz--n- <4.88g <4.88g 

The volumes seems to be free here. Moreover, I've deleted and recreated all the volumes prior to running the command.

Here is the command I run: ./gk-deploy -n gluster -w 900 -g -y

Thanks for your help

phlogistonjohn commented 6 years ago

Very odd indeed. Would you be willing to put a db dump on a pastebin? If so, run heketi-cli db dump to from within the pod get the json dump and put it on fpaste.org or a pastebin of your choice.

bend commented 6 years ago

I've provisioned a new node and removed the old one and now I passed this step. However now I get the following error:

 ./gk-deploy -n gluster -w 900 -g -y topology.json                                                                                                   
Using Kubernetes CLI.                                                                                                                                                                   
Using namespace "gluster".                                                                                                                                                              
Checking for pre-existing resources...                                                                                                                                                  
  GlusterFS pods ... found.                                                                                                                                                             
  deploy-heketi pod ... found.                                                                                                                                                          
  heketi pod ... not found.                                                                                                                                                             
  gluster-s3 pod ... not found.                                                                                                                                                         
Creating initial resources ... Error from server (AlreadyExists): error when creating "/home/ben/k8s/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml": serviceaccou
nts "heketi-service-account" already exists                                                                                                                                             
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "heketi-sa-view" already exists                                                                        
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view not labeled                                                                                                                 
OK                                                                                                                                                                                      
Found node vps01 on cluster 596df7e07ab71717092785ce0f4c0c72                                                                                                                            
Found device /dev/loop0                                                                                                                                                                 
Found node vps02 on cluster 596df7e07ab71717092785ce0f4c0c72                                                                                                                            
Found device /dev/loop0                                                                                                                                                                 
Found node vps04 on cluster 596df7e07ab71717092785ce0f4c0c72                                                                                                                            
Found device /dev/loop0                                                                                                                                                                 
heketi topology loaded.                                                                                                                                                                 
Error: Volume heketidbstorage alreay exists                                                                                                                                             
command terminated with exit code 255                                                                                                                                                   
Failed on setup openshift heketi storage                                                                                                                                                
This may indicate that the storage must be wiped and the GlusterFS nodes must be reset. 
[root@deploy-heketi-559446b649-6z9w9 /]# heketi-cli topology info                                                                                                                       
Cluster Id: 07d0a6d37eb03d98081776ecba94ee27                                                                                                                                            
    File:  true                                                                                                                                                                         
    Block: true                                                                                                                                                                         
    Volumes:                                                                                                                                                                            
    Nodes:                                                                                                                                                                              
        Node Id: 5502b48c704c3cd3ca0bd44b45793ad1                                                                                                                                       
        State: online                                                                                                                                                                   
        Cluster Id: 07d0a6d37eb03d98081776ecba94ee27                                                                                                                                    
        Zone: 1                                                                                                                                                                         
        Management Hostnames: vps04                                                                                                                                                     
        Storage Hostnames: 51.68.XX.XX1                                                                                                                                                 
        Devices:                                                                                                                                                                        
                Id:c419affdc56e8cc65cc89109aafe08bf   Name:/dev/loop0          State:online    Size (GiB):12      Used (GiB):2       Free (GiB):10                                      
                        Bricks:                                                                                                                                                         
                                Id:2a72d082a3d4b92b513b92fa99d269ab   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_c419affdc56e8cc65cc89109aafe08bf/brick_2a72d082a3d4b92b513b92fa
99d269ab/brick                                                                                                                                                                          
        Node Id: 91f513210187420b8746d6f4bc05d855                                                                                                                                       
        State: online                                                                                                                                                                   
        Cluster Id: 07d0a6d37eb03d98081776ecba94ee27                                                                                                                                    
        Zone: 1                                                                                                                                                                         
        Management Hostnames: vps01                                                                                                                                                     
        Storage Hostnames: 51.68.XXX.XXX                                                                                                                                               
        Devices:                                                                                                                                                                        
                Id:88e0c894ad70e8e199ad91c7a8925faf   Name:/dev/loop0          State:online    Size (GiB):12      Used (GiB):2       Free (GiB):10                                      
                        Bricks:                                                                                                                                                         
                                Id:f08a791be3155ee5791dfbee31aa6b0e   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_88e0c894ad70e8e199ad91c7a8925faf/brick_f08a791be3155ee5791dfbee
31aa6b0e/brick                                                                                                                                                                          
        Node Id: ca245feedc741e2b1706aecc628e0661                                                                                                                                       
        State: online                                                                                                                                                                   
        Cluster Id: 07d0a6d37eb03d98081776ecba94ee27                                                                                                                                    
        Zone: 1                                                                                                                                                                         
        Management Hostnames: vps02                                                                                                                                                     
        Storage Hostnames: 51.68.X1.XXX                                                                                                                                                
        Devices:                                                                                                                                                                        
                Id:758454435cc0e6cb7fd1a0daafb877ce   Name:/dev/loop0          State:online    Size (GiB):12      Used (GiB):2       Free (GiB):10                                      
                        Bricks:                                                                                                                                                         
                                Id:c3997e9d9ae08802b293e4def686ecbc   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_758454435cc0e6cb7fd1a0daafb877ce/brick_c3997e9d9ae08802b293e4de
f686ecbc/brick

And the heketi logs can be found here: https://gist.github.com/bend/4d355203c3edab80831c343f9a9210d9

The error is weird:


[heketi] WARNING 2018/08/31 07:33:39 failed to delete volume f0a524cf1265ff8fb27405ac42ef93af via vps01: Unable to delete volume heketidbstorage: Unable to execute command on glusterfs
-vhkdb: volume delete: heketidbstorage: failed: Some of the peers are down  

Any idea ?

fredrik-jansson-se commented 5 years ago

I'm seeing this issue as well, did you ever find a solution?

drungrin commented 4 years ago

In my case is beacause I had only 2 nodes on kubernetes. The setup-openshift-heketi-storage command doesn't uses the --replica param reflecting the topology.