gluster / gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes
Apache License 2.0
875 stars 389 forks source link

some bricks is offline after gluster container recreate #536

Open toyangdon opened 5 years ago

toyangdon commented 5 years ago

When I recreate gluster container,some bricks in this node is offline. I find something in glusterd.log:

[2018-11-13 02:01:34.895633] I [glusterd-utils.c:5962:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/heketi/mounts/vg_9c9fdf12cba6212fd6ccb05bfd270c41/brick_1024c2e1e5d792d11db3d178da54cca1/brick
[2018-11-13 02:01:34.899988] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-11-13 02:01:34.900198] W [socket.c:3216:socket_connect] 0-management: Error disabling sockopt IPV6_V6ONLY: "Operation not supported"
[2018-11-13 02:01:37.330378] I [glusterd-utils.c:5962:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/heketi/mounts/vg_9c9fdf12cba6212fd6ccb05bfd270c41/brick_cb20e11db86329774d7366eb5d632ef9/brick
[2018-11-13 02:01:37.333050] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-11-13 02:01:39.170395] I [glusterd-utils.c:5868:glusterd_brick_start] 0-management: discovered already-running brick /var/lib/heketi/mounts/vg_9c9fdf12cba6212fd6ccb05bfd270c41/brick_fd34cfc29f41624e4f1e76ed061f9dad/brick
[2018-11-13 02:01:39.170437] I [MSGID: 106143] [glusterd-pmap.c:282:pmap_registry_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_9c9fdf12cba6212fd6ccb05bfd270c41/brick_fd34cfc29f41624e4f1e76ed061f9dad/brick on port 49173

Some brick is considered as "already-running brick ",but they are not rumming in fact. I try to remove volume mount "glusterfs-run",and then I find all of bricks become normal after recreate gluster container.
What the use of the volume mount "glusterfs-run" in glusterfs-daemonset.yaml ? Can I remove it?

kcao3 commented 5 years ago

We're running into this same problem. Does anyone know the purpose of the glusterfs-run volume with an emptyDir type: https://github.com/gluster/gluster-kubernetes/blob/master/deploy/kube-templates/glusterfs-daemonset.yaml#L101?

bischoje commented 5 years ago

We are also observing this issue, running on gluster 4.1.7.

It claims it found an already-running brick and that brick is never started. The containers that rely on it hang or enter a crash loop. It's pretty easy to reproduce, you just have to hard-power off the VM hosting gluster or use a kill -9 on the gluster processes.

Should this issue be moved to bugzilla? https://bugzilla.redhat.com/

ghost commented 5 years ago

Same for me (gluster 4.1.7). Any news on this issue?

kinsu commented 4 years ago

could it be that pidfile of the brick has a pid which was points to some other running process? If this is still reproducible, can you check the pid ?