glusterfs: create volume err: error creating volume .

th3penguinwhisperer commented 7 years ago

Hi,

Thanks for writing this nice tool to deploy gluster on openshift. However I still seem to be stuck with the above error in the logs. It seems the gluster pods are running and the heketi pod as well. The endpoints, ... are all available.

However every claim I try goes to pending state and stays there.

The gluster cluster is put in the default namespace.

glusterfs: got gid [2000] for PVC test18
glusterfs: create volume of size: 1073741824 bytes and configuration {url:http://172.30.206.46:8080/ user:admin userKey: secretNamespace:default secretName:heketi-secret secretValue:mypassword clusterId: gidMin:2000 gidMax:2147483647 volumeType:{Type:replicate Replicate:{Replica:3} Disperse:{Data:0 Redundancy:0}}}
glusterfs: error creating volume
glusterfs.go:664] glusterfs: create volume err: error creating volume .
pv_controller.go:1306] failed to provision volume for claim "default/test18" with StorageClass "gluster-container": glusterfs: create volume err: error creating volume .
...
glusterfs: create volume of size: 1073741824 bytes and configuration {url:http://172.30.206.46:8080/ user:admin userKey: secretNamespace:default secretName:heketi-secret secretValue:mypassword clusterId: gidMin:2000 gidMax:2147483647 volumeType:{Type:replicate Replicate:{Replica:3} Disperse:{Data:0 Redundancy:0}}}

When I go on a gluster pod and do gluster volume info I see one volume heketidbstorage. I believe it's also using this volume:

192.168.178.33:heketidbstorage on /var/lib/heketi type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

Might this have to do with version differences? This is openshift 1.5.1 and the gluster pods seem to be gluster/gluster-centos:latest (nothing I did manually here AFAIK).

Storageclass is:

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  creationTimestamp: 2017-07-24T18:09:35Z
  name: gluster-container
  resourceVersion: "59215"
  selfLink: /apis/storage.k8s.io/v1beta1/storageclassesgluster-container
  uid: 402a3d21-709b-11e7-840b-525400548fa6
parameters:
  resturl: http://172.30.206.46:8080/
  restuser: admin
  secretName: heketi-secret
  secretNamespace: default
provisioner: kubernetes.io/glusterfs

Does someone have a clue what's going on here? There doesn't seem to be logging for heketi :s

Thanks in advance.

th3penguinwhisperer commented 7 years ago

I seem to get this error when I use the http://xyz without a trailing slash in the storageclass. I could capture it with tcpdump. Not sure if this is actually my problem or just because I've been fiddling so much.

HTTP/1.1 500 Internal Server Error
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Mon, 24 Jul 2017 19:35:53 GMT
Content-Length: 68
Set-Cookie: ff1a8205a9a2f52921c45408892389cf=d6838f12e85f29239853163010fa57a7; path=/; HttpOnly

Error calling v.allocBricksInCluster: database is in read-only mode

Error: Error calling v.allocBricksInCluster: database is in read-only mode

th3penguinwhisperer commented 7 years ago

So I can recreate the same error empty error message:

[root@openshift-master ~]# heketi-cli --json=true --user admin --secret mypassword -s http://heketi-default.cloud.xyz volume create --size=1 --persistent-volume-file=pv001.json 
Error: Error calling v.allocBricksInCluster: database is in read-only mode
[root@openshift-master ~]# heketi-cli --json=true --user admin --secret mypassword -s http://heketi-default.cloud.xyz/ volume create --size=1 --persistent-volume-file=pv001.json 
Error:

jarrpa commented 7 years ago

@th3penguinwhisperer Could you please reformat your comments to have ``` around your snippets of output? :)

th3penguinwhisperer commented 7 years ago

I think I've got a step further:

When using the URL with the trailing slash it hides the actual error it seems. When removing this I see the db in read-only mode. Not sure why and how to fix that in the future but what I did was kill the heketi process (NOT the pod as that didn't help). After that a new process is spawned and it seems to work (or perhaps I'm lucky and it's now on the node where it's working and it doesn't work on the other ones).

Note that at heketi pod startup I see this maessage in the log:

[heketi] WARNING 2017/07/24 19:48:00 Unable to open database. Retrying using read only mode

So that's probably making it go to read only mode.

th3penguinwhisperer commented 7 years ago

Now that I know what is causing this and how I can work around it perhaps there's something that can be changed to the deploymentconfig that gk-deploy creates so that there's a small delay before creating a new pod or something similar?

jarrpa commented 7 years ago

So what seems to be the workaround? Where shuld this delay go?

th3penguinwhisperer commented 7 years ago

I'm not sure if this delay is even possible. However when the existing pod is deleted, the new one should have a small delay before it gets recreated.

But I'll have to verify again if after the DB got out of readonly mode I can retrigger the issue by deleting the heketi pod. I'll post an update.

DanyC97 commented 6 years ago

have you got anywhere with this @th3penguinwhisperer ? cheers

gluster / gluster-kubernetes

glusterfs: create volume err: error creating volume . #282