gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.69k stars 1.08k forks source link

Filesystems of Virtual Machines with qcow2 files in Gluster are left in a Read-Only FS state after adding brick to the volume #1244

Closed jcresp21 closed 3 years ago

jcresp21 commented 4 years ago

Description of problem: The volume was originally a Replica 2 with an Arbiter, each brick located in a different node, As part of a reinstallation of one of the nodes ( it was upgraded to ubuntu Bionic) the volume was shrinked to remain with only 1 brick , the arbiter had to be removed also. After reinstalling the node, a new brick from this node was added to the volume, and also the arbiter from a different node, causing VM's with qcow2 files in the Gluster Volume to change all their FS to read-only. The VMs had to be rebooted, after that the problem stopped.

The exact command to reproduce the issue: -> Deletion of brick:

  1. Stopped brick process from node to be reinstalled kill -15 <brick_process>
  2. gluster volume remove-brick gv1 replica 1 gluster-2.xcade.net:/mnt/gv_gu2/newbrick gluster-1.xcade.net:/mnt/arbiter/arbiter_brick From the deletion of the brick I can see on the logs the following Error, but VM's keep working without problems. E [fuse-bridge.c:227:check_and_dump_fuse_W] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x138)[0x7f3f7bf44848] (--> /usr/lib/x86_64-linux-gnu/glusterfs/7.5/xlator/mount/fuse.so(+0x7bda)[0x7f3f79867bda] (--> /usr/lib/x86_64-linux-gnu/glusterfs/7.5/xlator/mount/fuse.so(+0x7d35)[0x7f3f79867d35] (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f3f7b6ae6ba] (--> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f3f7b3e441d] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory

-> Re-Adding of bricks:

gluster volume add-brick gv2 replica 3 arbiter 1 gluster-2.xcade.net:/mnt/gv_gu2/newbrick gluster-1.xcade.net:/mnt/arbiter/arbiter_brick

The command didn't report any failure: Reported SUCCESS

- The output of the gluster volume info command: Volume Name: gv2 Type: Replicate Volume ID: 8cd2932f-44e4-421c-acec-69de2001f247 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gluster-3.xcade.net:/mnt/gv_gu2/newbrick Brick2: gluster-2.xcade.net:/mnt/gv_gu2/newbrick Brick3: gluster-1.xcade.net:/mnt/arbiter/arbiter_brick (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet

- The operating system / glusterfs version: 2 Nodes running with Ubuntu Xenial 16.04 1 Node (Gluster-2) running with Ubuntu Bionic 18.04 All nodes runnning latest gluster version 7.5

- Logs of operation - [2020-05-12 14:31:22.031090] I [glusterfsd.c:2594:daemonize] 0-glusterfs: Pid of current running process is 7479 [2020-05-12 14:31:22.039603] I [MSGID: 114020] [client.c:2436:notify] 0-gv2-client-1: parent translators are ready, attempting connect on transport [2020-05-12 14:31:22.040260] I [MSGID: 114020] [client.c:2436:notify] 0-gv2-client-2: parent translators are ready, attempting connect on transport [2020-05-12 14:31:22.040507] I [MSGID: 114020] [client.c:2436:notify] 0-gv2-client-3: parent translators are ready, attempting connect on transport Final graph: +------------------------------------------------------------------------------+ 1: volume gv2-client-1 2: type protocol/client 3: option ping-timeout 42 4: option remote-host gluster-gu3.xcade.net 5: option remote-subvolume /mnt/gv_gu2/newbrick 6: option transport-type socket 7: option transport.address-family inet 8: option transport.socket.ssl-enabled off 9: option transport.tcp-user-timeout 0 10: option transport.socket.keepalive-time 20 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-count 9 13: option send-gids true 14: end-volume 15: 16: volume gv2-client-2 17: type protocol/client 18: option ping-timeout 42 19: option remote-host gluster-gu2.xcade.net 20: option remote-subvolume /mnt/gv_gu2/newbrick 21: option transport-type socket 22: option transport.address-family inet 23: option transport.socket.ssl-enabled off 24: option transport.tcp-user-timeout 0 25: option transport.socket.keepalive-time 20 26: option transport.socket.keepalive-interval 2 27: option transport.socket.keepalive-count 9 28: option send-gids true 29: end-volume 30: 31: volume gv2-client-3 32: type protocol/client 33: option ping-timeout 42 34: option remote-host gluster-gu1.xcade.net 35: option remote-subvolume /mnt/arbiter/arbiter_brick 36: option transport-type socket 37: option transport.address-family inet 38: option transport.socket.ssl-enabled off 39: option transport.tcp-user-timeout 0 40: option transport.socket.keepalive-time 20 41: option transport.socket.keepalive-interval 2 42: option transport.socket.keepalive-count 9 43: option send-gids true 44: end-volume 45: 46: volume gv2-replicate-0 47: type cluster/replicate 48: option afr-pending-xattr gv2-client-1,gv2-client-2,gv2-client-3 49: option arbiter-count 1 50: option use-compound-fops off 51: subvolumes gv2-client-1 gv2-client-2 gv2-client-3 52: end-volume 53: 54: volume gv2-dht 55: type cluster/distribute 56: option lock-migration off 57: option force-migration off 58: subvolumes gv2-replicate-0 59: end-volume 60: 61: volume gv2-utime 62: type features/utime 63: option noatime on 64: subvolumes gv2-dht 65: end-volume 66: 67: volume gv2-write-behind 68: type performance/write-behind 69: subvolumes gv2-utime 70: end-volume 71: 72: volume gv2-read-ahead 73: type performance/read-ahead 74: subvolumes gv2-write-behind 75: end-volume 76: 77: volume gv2-readdir-ahead 78: type performance/readdir-ahead 79: option parallel-readdir off 80: option rda-request-size 131072 81: option rda-cache-limit 10MB 82: subvolumes gv2-read-ahead 83: end-volume 84: 85: volume gv2-io-cache 86: type performance/io-cache 87: subvolumes gv2-readdir-ahead 88: end-volume 89: 90: volume gv2-open-behind 91: type performance/open-behind 92: subvolumes gv2-io-cache 93: end-volume 94: 95: volume gv2-quick-read 96: type performance/quick-read 97: subvolumes gv2-open-behind 98: end-volume 99: 100: volume gv2-md-cache 101: type performance/md-cache 102: subvolumes gv2-quick-read 103: end-volume 104: 105: volume gv2 106: type debug/io-stats 107: option log-level INFO 108: option threads 16 109: option latency-measurement off 110: option count-fop-hits off 111: option global-threading off 112: subvolumes gv2-md-cache 113: end-volume 114: 115: volume meta-autoload 116: type meta 117: subvolumes gv2 118: end-volume 119: +------------------------------------------------------------------------------+ [2020-05-12 14:31:22.041236] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 [2020-05-12 14:31:22.041326] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2020-05-12 14:31:22.041644] E [MSGID: 114058] [client-handshake.c:1455:client_query_portmap_cbk] 0-gv2-client-2: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2020-05-12 14:31:22.041686] I [socket.c:865:__socket_shutdown] 0-gv2-client-2: intentional socket shutdown(12) [2020-05-12 14:31:22.041710] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-gv2-client-1: changing port to 49153 (from 0) [2020-05-12 14:31:22.041723] I [MSGID: 114018] [client.c:2347:client_rpc_notify] 0-gv2-client-2: disconnected from gv2-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2020-05-12 14:31:22.041729] I [socket.c:865:__socket_shutdown] 0-gv2-client-1: intentional socket shutdown(11) [2020-05-12 14:31:22.041759] E [MSGID: 108006] [afr-common.c:5360:__afr_handle_child_down_event] 0-gv2-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. [2020-05-12 14:31:22.041986] E [MSGID: 114058] [client-handshake.c:1455:client_query_portmap_cbk] 0-gv2-client-3: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2020-05-12 14:31:22.042026] I [socket.c:865:__socket_shutdown] 0-gv2-client-3: intentional socket shutdown(13) [2020-05-12 14:31:22.042050] I [MSGID: 114018] [client.c:2347:client_rpc_notify] 0-gv2-client-3: disconnected from gv2-client-3. Client process will keep trying to connect to glusterd until brick's port is available [2020-05-12 14:31:22.042069] E [MSGID: 108006] [afr-common.c:5360:__afr_handle_child_down_event] 0-gv2-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. [2020-05-12 14:31:22.042564] I [MSGID: 114057] [client-handshake.c:1375:select_server_supported_programs] 0-gv2-client-1: Using Program GlusterFS 4.x v1, Num (1298437), Version (400) [2020-05-12 14:31:22.042748] W [dict.c:999:str_to_data] (-->/usr/lib/x86_64-linux-gnu/glusterfs/7.5/xlator/protocol/client.so(+0x354a1) [0x7f33ca9594a1] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set_str+0x16) [0x7f33d0349a46] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(str_to_data+0x60) [0x7f33d0346560] ) 0-dict: value is NULL [Invalid argument] [2020-05-12 14:31:22.042769] I [MSGID: 114006] [client-handshake.c:1236:client_setvolume] 0-gv2-client-1: failed to set process-name in handshake msg [2020-05-12 14:31:22.043551] I [MSGID: 114046] [client-handshake.c:1105:client_setvolume_cbk] 0-gv2-client-1: Connected to gv2-client-1, attached to remote volume '/mnt/gv_gu2/newbrick'. [2020-05-12 14:31:22.043581] I [MSGID: 108005] [afr-common.c:5283:__afr_handle_child_up_event] 0-gv2-replicate-0: Subvolume 'gv2-client-1' came back up; going online. [2020-05-12 14:31:22.044846] I [fuse-bridge.c:5166:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26 [2020-05-12 14:31:22.044876] I [fuse-bridge.c:5777:fuse_graph_sync] 0-fuse: switched to graph 0 [2020-05-12 14:31:22.045714] E [fuse-bridge.c:5235:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) [2020-05-12 14:31:22.047146] I [MSGID: 0] [afr-inode-write.c:1239:_afr_handle_empty_brick] 0-gv2-replicate-0: New brick is : gv2-client-2 [2020-05-12 14:31:22.048076] I [MSGID: 108039] [afr-inode-write.c:1064:afr_emptyb_set_pending_changelog_cbk] 0-gv2-replicate-0: Set of pending xattr succeeded on gv2-client-1. [2020-05-12 14:31:22.074927] I [fuse-bridge.c:6083:fuse_thread_proc] 0-fuse: initiating unmount of /tmp/mntSKnRMt [2020-05-12 14:31:22.049184] I [MSGID: 108039] [afr-inode-write.c:1064:afr_emptyb_set_pending_changelog_cbk] 0-gv2-replicate-0: Set of pending xattr succeeded on gv2-client-1. [2020-05-12 14:31:22.075142] W [glusterfsd.c:1596:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f33cfaba6db] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xfd) [0x55b585b6febd] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55b585b6fd04] ) 0-: received signum (15), shutting down [2020-05-12 14:31:22.075173] I [fuse-bridge.c:6898:fini] 0-fuse: Unmounting '/tmp/mntSKnRMt'. [2020-05-12 14:31:22.075191] I [fuse-bridge.c:6903:fini] 0-fuse: Closing fuse connection to '/tmp/mntSKnRMt'.

stale[bot] commented 3 years ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] commented 3 years ago

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.