gluster / gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes
Apache License 2.0
875 stars 390 forks source link

Heketi: Unable to open database: timeout #531

Closed Webgardener closed 5 years ago

Webgardener commented 6 years ago

Hi,

When trying to export the heketi.db in json format I enconter the following error:

[root@heketi-5bd5d976b6-vvdjg heketi]# heketi db export --dbfile heketi.db --jsonfile /tmp/q.json
[heketi] ERROR 2018/11/05 11:02:19 /src/github.com/heketi/heketi/apps/glusterfs/dbcommon.go:189: Unable to open database: timeout
failed to dump db: Unable to open database: timeout

Heketi version: Heketi v8.0.0-1-g082b556-release-8

Thanks in advance, Thomas

Webgardener commented 6 years ago

I might have found an explanation : it seems that it is impossible to open the db if it was already opened by another process (https://github.com/boltdb/bolt#opening-a-database).

Webgardener commented 6 years ago

I am trying to perform the actions mentioned in the troubleshooting guide (https://github.com/heketi/heketi/blob/master/docs/troubleshooting.md). I don't see how it is possible if the heketi.db file is locked ?

phlogistonjohn commented 6 years ago

Hi @Webgardener it may be that you haven't stopped a previous running heketi process (heketi pod). This command can only be run if there are no other processes using the db. There are generally two ways to accessing the heketi db file when the heketi pod is stopped (and the pod does not exist for you to exec commands in):

  1. Use glusterfs to mount the heketidbstorage volume on another host. One convient place for this is within one of the gluster pods. If using this method it may be useful to copy the heketi binary into the root of the heketidbstorage volme first (before stopping it, within the pod, run: cp /usr/bin/heketi /var/lib/heketi).
  2. After you stop the heketi pod, create a k8s job or pod manually that uses the same image and mountpoints as the heketi dc but does not start the heketi server. Then you can exec into this pod and run commands like you would the heketi pod without the heketi server process accessing the db.
Webgardener commented 5 years ago

Thank you @phlogistonjohn ! I figured out how to repair the heketi db. The heketi Pod is runnning again.

But it remains impossible to provision new PVC.

Describe PVC in Pending status: Warning ProvisioningFailed 37s persistentvolume-controller Failed to provision volume with StorageClass "storage-test-gluster": failed to create volume: failed to get cluster nodes for volume Name: vol_2b4b4927393ef288cb48d9d7156591d5

Lots of errors in Heketi logs: [kubeexec] ERROR 2018/11/06 10:21:54 heketi/executors/kubeexec/kubeexec.go:275:kubeexec.(*KubeExecutor).execCommands: Failed to run command [lvremove --autobackup=n -f vg_8455498078a755b8c58abffefa5f07c4/tp_8128f27a8e075cebcdd28f85046fe337] on glusterfs-bnbzg: Err[command terminated with exit code 5]: Stdout []: Stderr [ WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). Failed to find logical volume "vg_8455498078a755b8c58abffefa5f07c4/tp_8128f27a8e075cebcdd28f85046fe337" ] [heketi] ERROR 2018/11/06 10:21:53 heketi/apps/glusterfs/operations_manage.go:89:glusterfs.AsyncHttpOperation: Create Volume Build Failed: No space [negroni] Completed 500 Internal Server Error in 1.298457173s [kubeexec] ERROR 2018/11/06 10:21:54 heketi/executors/kubeexec/kubeexec.go:275:kubeexec.(*KubeExecutor).execCommands: Failed to run command [lvremove --autobackup=n -f vg_f84afeb845eeb69f74b63c19fb3039a9/tp_44f1cb171ce486c5d3b012de360835af] on glusterfs-v6j26: Err[command terminated with exit code 5]: Stdout []: Stderr [ Failed to find logical volume "vg_f84afeb845eeb69f74b63c19fb3039a9/tp_44f1cb171ce486c5d3b012de360835af" [cmdexec] ERROR 2018/11/06 10:21:54 heketi/executors/cmdexec/brick.go:280:cmdexec.(*CmdExecutor).BrickDestroy: rmdir: failed to remove '/var/lib/heketi/mounts/vg_f84afeb845eeb69f74b63c19fb3039a9/brick_44f1cb171ce486c5d3b012de360835af': No such file or directory

Webgardener commented 5 years ago

The gluster volume group are full... That is why creation of PVC were failing.