gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.76k stars 1.08k forks source link

Glusterfs (fuse) client crashes #2722

Open MarvinTO opened 3 years ago

MarvinTO commented 3 years ago

Description of problem: We have a glusterfs volume used to store some grafana data. There is a sqlite database stored in the volume.

After upgrading the fuse driver to 8.5, the client mount started to crash constantly (using an older version of the fuse driver is working correctly 7.7). I was not able to find any errors on the servers side at the time of the client crash.

The exact command to reproduce the issue: The mount is being handled by an entry in /etc/fstab gfs.prod.com:/grafana /opt/grafana glusterfs defaults 0 0

**The full output of the command that failed**: ``` [2021-08-19 16:48:38.770606] I [MSGID: 100030] [glusterfsd.c:2689:main] 0-/usr/sbin/glusterfs: Started running version [{arg=/usr/sbin/glusterfs}, {version=8.5}, {cmdlinestr=/usr/sbin/glusterfs --process-name fuse --volfile-server=gfs.prod.com --volfile-id=/grafana /opt/grafana}] [2021-08-19 16:48:38.772337] I [glusterfsd.c:2424:daemonize] 0-glusterfs: Pid of current running process is 28076 [2021-08-19 16:48:38.777452] I [socket.c:4252:ssl_setup_connection_params] 0-glusterfs: SSL support for MGMT is ENABLED IO path is ENABLED certificate depth is 3 for peer [2021-08-19 16:48:38.834561] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}] [2021-08-19 16:48:38.834620] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=1}] [2021-08-19 16:48:38.847727] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: gfs009.prod.com:24007 gfs008.prod.com:24007 [2021-08-19 16:48:38.853408] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=2}] [2021-08-19 16:48:38.853508] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=3}] [2021-08-19 16:48:38.855828] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-0: parent translators are ready, attempting connect on transport [] [2021-08-19 16:48:38.860215] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-1: parent translators are ready, attempting connect on transport [] [2021-08-19 16:48:38.864363] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-2: parent translators are ready, attempting connect on transport [] Final graph: +------------------------------------------------------------------------------+ 1: volume grafana-client-0 2: type protocol/client 3: option ping-timeout 42 4: option remote-host gfs009.prod.com 5: option remote-subvolume /opt/data/grafana 6: option transport-type socket 7: option transport.address-family inet 8: option transport.socket.ssl-enabled off 9: option event-threads 4 10: option transport.tcp-user-timeout 0 11: option transport.socket.keepalive-time 20 12: option transport.socket.keepalive-interval 2 13: option transport.socket.keepalive-count 9 14: option strict-locks off 15: option send-gids true 16: end-volume 17: 18: volume grafana-client-1 19: type protocol/client 20: option ping-timeout 42 21: option remote-host gfs007.prod.com 22: option remote-subvolume /opt/data/grafana 23: option transport-type socket 24: option transport.address-family inet 25: option transport.socket.ssl-enabled off 26: option event-threads 4 27: option transport.tcp-user-timeout 0 28: option transport.socket.keepalive-time 20 29: option transport.socket.keepalive-interval 2 30: option transport.socket.keepalive-count 9 31: option strict-locks off 32: option send-gids true 33: end-volume 34: 35: volume grafana-client-2 36: type protocol/client 37: option ping-timeout 42 38: option remote-host gfs008.prod.com 39: option remote-subvolume /opt/data/grafana 40: option transport-type socket 41: option transport.address-family inet 42: option transport.socket.ssl-enabled off 43: option event-threads 4 44: option transport.tcp-user-timeout 0 45: option transport.socket.keepalive-time 20 46: option transport.socket.keepalive-interval 2 47: option transport.socket.keepalive-count 9 48: option strict-locks off 49: option send-gids true 50: end-volume 51: 52: volume grafana-replicate-0 53: type cluster/replicate 54: option afr-pending-xattr grafana-client-0,grafana-client-1,grafana-client-2 55: option favorite-child-policy majority 56: option use-compound-fops off 57: subvolumes grafana-client-0 grafana-client-1 grafana-client-2 58: end-volume 59: 60: volume grafana-dht 61: type cluster/distribute 62: option lookup-optimize on 63: option readdir-optimize on 64: option lock-migration off 65: option force-migration off 66: subvolumes grafana-replicate-0 67: end-volume 68: 69: volume grafana-utime 70: type features/utime 71: option noatime on 72: subvolumes grafana-dht 73: end-volume 74: 75: volume grafana-write-behind 76: type performance/write-behind 77: option flush-behind on 78: option cache-size 16MB 79: subvolumes grafana-utime 80: end-volume 81: 82: volume grafana-readdir-ahead 83: type performance/readdir-ahead 84: option parallel-readdir off 85: option rda-request-size 131072 86: option rda-cache-limit 10MB 87: subvolumes grafana-write-behind 88: end-volume 89: 90: volume grafana-io-cache 91: type performance/io-cache 92: option cache-size 2GB 93: subvolumes grafana-readdir-ahead 94: end-volume 95: 96: volume grafana-open-behind 97: type performance/open-behind 98: subvolumes grafana-io-cache 99: end-volume 100: 101: volume grafana-quick-read 102: type performance/quick-read 103: option cache-size 2GB 104: subvolumes grafana-open-behind 105: end-volume 106: 107: volume grafana-md-cache 108: type performance/md-cache 109: subvolumes grafana-quick-read 110: end-volume 111: 112: volume grafana 113: type debug/io-stats 114: option log-level INFO 115: option threads 16 116: option latency-measurement off 117: option count-fop-hits off 118: option global-threading off 119: subvolumes grafana-md-cache 120: end-volume 121: 122: volume meta-autoload 123: type meta 124: subvolumes grafana 125: end-volume 126: +------------------------------------------------------------------------------+ [2021-08-19 16:48:38.871921] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-0: changing port to 49152 (from 0) [2021-08-19 16:48:38.872086] I [socket.c:849:__socket_shutdown] 0-grafana-client-0: intentional socket shutdown(13) [2021-08-19 16:48:38.877519] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-1: changing port to 49152 (from 0) [2021-08-19 16:48:38.877570] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-0: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] [2021-08-19 16:48:38.877786] I [socket.c:849:__socket_shutdown] 0-grafana-client-1: intentional socket shutdown(14) [2021-08-19 16:48:38.880255] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-0: Connected, attached to remote volume [{conn-name=grafana-client-0}, {remote_subvol=/opt/data/grafana}] [2021-08-19 16:48:38.880334] I [MSGID: 108005] [afr-common.c:5993:__afr_handle_child_up_event] 0-grafana-replicate-0: Subvolume 'grafana-client-0' came back up; going online. [2021-08-19 16:48:38.883553] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-1: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] [2021-08-19 16:48:38.885425] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-1: Connected, attached to remote volume [{conn-name=grafana-client-1}, {remote_subvol=/opt/data/grafana}] [2021-08-19 16:48:38.885520] I [MSGID: 108002] [afr-common.c:6363:afr_notify] 0-grafana-replicate-0: Client-quorum is met [2021-08-19 16:48:38.886059] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-2: changing port to 49152 (from 0) [2021-08-19 16:48:38.886218] I [socket.c:849:__socket_shutdown] 0-grafana-client-2: intentional socket shutdown(15) [2021-08-19 16:48:38.891974] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-2: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] [2021-08-19 16:48:38.901301] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-2: Connected, attached to remote volume [{conn-name=grafana-client-2}, {remote_subvol=/opt/data/grafana}] [2021-08-19 16:48:38.904360] I [fuse-bridge.c:5300:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.33 [2021-08-19 16:48:38.904408] I [fuse-bridge.c:5928:fuse_graph_sync] 0-fuse: switched to graph 0 [2021-08-19 16:51:04.399387] W [fuse-bridge.c:1387:fuse_attr_cbk] 0-glusterfs-fuse: 7681: STAT() /grafana.db-journal => -1 (Stale file handle) [2021-08-19 16:51:04.400295] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfiddb027943-9da2-4215-a366-e604a72e3010}] [2021-08-19 16:51:08.328621] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfid720826ed-36bb-4e88-9bdb-0dc86157c882}] [2021-08-19 16:52:30.905560] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-grafana-client-0: remote operation failed. [{path=/grafana.db-journal}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=61}, {error=No data available}] [2021-08-19 16:52:31.006928] W [fuse-bridge.c:1387:fuse_attr_cbk] 0-glusterfs-fuse: 16064: STAT() /grafana.db-journal => -1 (Stale file handle) [2021-08-19 16:52:33.328302] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfideab087b3-0ca3-4064-829f-30cb29796e8d}] [2021-08-19 16:56:06.337099] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidf913d531-837b-4f12-972a-cc7f2a9833f3}] [2021-08-19 16:59:13.429520] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-grafana-client-0: remote operation failed. [{path=/grafana.db-journal}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=61}, {error=No data available}] [2021-08-19 16:59:15.333468] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfid3530b62f-26bd-4328-ba52-82eeaf5e2838}] [2021-08-19 16:59:20.328369] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidcae336ae-ee34-4d89-92b1-821a52564895}] [2021-08-19 17:03:26.333398] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidba24faf1-d7c6-4f19-8b71-3d3eecd589d5}] [2021-08-19 17:21:09.787909] E [MSGID: 126004] [open-behind.c:587:ob_writev] 0-grafana-open-behind: Failed to submit fop [{fop=writev}, {errno=116}, {error=Stale file handle}] [2021-08-19 17:21:09.788000] W [fuse-bridge.c:3049:fuse_writev_cbk] 0-glusterfs-fuse: 84876: WRITE => -1 gfid=628bfe1d-e406-4bf5-a31d-4bcc61693e09 fd=0x7f07100d4248 (Stale file handle) [2021-08-19 17:21:09.788168] E [MSGID: 126004] [open-behind.c:618:ob_flush] 0-grafana-open-behind: Failed to submit fop [{fop=flush}, {errno=116}, {error=Stale file handle}] [2021-08-19 17:21:09.788204] W [fuse-bridge.c:1952:fuse_err_cbk] 0-glusterfs-fuse: 84877: FLUSH() ERR => -1 (Stale file handle) [2021-08-19 17:21:09.792144] W [fuse-bridge.c:2036:fuse_unlink_cbk] 0-glusterfs-fuse: 84878: UNLINK() /grafana.db-journal => -1 (No such file or directory) pending frames: frame : type(1) op(LK) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2021-08-19 17:21:09 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 8.5 /lib64/libglusterfs.so.0(+0x27e3f)[0x7f073b57ce3f] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f073b587ce4] /lib64/libc.so.6(+0x36400)[0x7f0739bc6400] /usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x3b41b)[0x7f072c7e041b] /usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x3c790)[0x7f072c7e1790] /usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x587fd)[0x7f072c7fd7fd] /lib64/libgfrpc.so.0(+0xf101)[0x7f073b328101] /lib64/libgfrpc.so.0(+0xf467)[0x7f073b328467] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f073b324a93] /usr/lib64/glusterfs/8.5/rpc-transport/socket.so(+0x4278)[0x7f072f6f4278] /usr/lib64/glusterfs/8.5/rpc-transport/socket.so(+0x9a91)[0x7f072f6f9a91] /lib64/libglusterfs.so.0(+0x8c91c)[0x7f073b5e191c] /lib64/libpthread.so.0(+0x7ea5)[0x7f073a3c8ea5] /lib64/libc.so.6(clone+0x6d)[0x7f0739c8e9fd] --------- ``` **Expected results:** Fuse mount to be up & running **Mandatory info:** **- The output of the `gluster volume info` command**: Volume Name: grafana Type: Replicate Volume ID: 004549c4-5299-471c-b2f4-339f3437f844 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gfs009.prod.com:/opt/data/grafana Brick2: gfs007.prod.com:/opt/data/grafana Brick3: gfs008.prod.com:/opt/data/grafana Options Reconfigured: server.event-threads: 4 performance.write-behind-window-size: 16MB performance.readdir-ahead: enable performance.read-ahead: disable performance.io-thread-count: 32 performance.io-cache: on performance.flush-behind: on performance.client-io-threads: off performance.cache-size: 2GB cluster.lookup-optimize: on client.event-threads: 4 cluster.readdir-optimize: on transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on cluster.favorite-child-policy: majority **- The output of the `gluster volume status` command**: Status of volume: grafana Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gfs009.prod.com:/opt/data/grafana 49152 0 Y 9013 Brick gfs007.prod.com:/opt/data/grafana 49152 0 Y 8888 Brick gfs008.prod.com:/opt/data/grafana 49152 0 Y 8958 Self-heal Daemon on localhost N/A N/A Y 8896 Self-heal Daemon on gfs009.prod.com N/A N/A Y 9024 Self-heal Daemon on gfs008.prod.com N/A N/A Y 8969 Task Status of Volume grafana ------------------------------------------------------------------------------ There are no active volume tasks **- The output of the `gluster volume heal` command**: Brick gfs009.prod.com:/opt/data/grafana Status: Connected Number of entries: 0 Brick gfs007.prod.com:/opt/data/grafana Status: Connected Number of entries: 0 Brick gfs008.prod.com:/opt/data/grafana Status: Connected Number of entries: 0 **Additional info:**

- The operating system / glusterfs version: centos79 / glusterfs 8.5

mykaul commented 3 years ago

Looks like https://github.com/gluster/glusterfs/issues/1966 ?

On Thu, 19 Aug 2021, 21:17 MarvinTO @.***> wrote:

Description of problem: We have a glusterfs volume used to store some grafana data. There is a sqlite database stored in the volume.

After upgrading the fuse driver to 8.5, the client mount started to crash constantly (using an older version of the fuse driver is working correctly 7.7). I was not able to find any errors on the servers side at the time of the client crash.

The exact command to reproduce the issue: The mount is being handled by an entry in /etc/fstab gfs.prod.com:/grafana /opt/grafana glusterfs defaults 0 0

The full output of the command that failed:

[2021-08-19 16:48:38.770606] I [MSGID: 100030] [glusterfsd.c:2689:main] 0-/usr/sbin/glusterfs: Started running version [{arg=/usr/sbin/glusterfs}, {version=8.5}, {cmdlinestr=/usr/sbin/glusterfs --process-name fuse --volfile-server=gfs.prod.com --volfile-id=/grafana /opt/grafana}] [2021-08-19 16:48:38.772337] I [glusterfsd.c:2424:daemonize] 0-glusterfs: Pid of current running process is 28076 [2021-08-19 16:48:38.777452] I [socket.c:4252:ssl_setup_connection_params] 0-glusterfs: SSL support for MGMT is ENABLED IO path is ENABLED certificate depth is 3 for peer [2021-08-19 16:48:38.834561] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}] [2021-08-19 16:48:38.834620] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=1}] [2021-08-19 16:48:38.847727] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: gfs009.prod.com:24007 gfs008.prod.com:24007 [2021-08-19 16:48:38.853408] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=2}] [2021-08-19 16:48:38.853508] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=3}] [2021-08-19 16:48:38.855828] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-0: parent translators are ready, attempting connect on transport [] [2021-08-19 16:48:38.860215] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-1: parent translators are ready, attempting connect on transport [] [2021-08-19 16:48:38.864363] I [MSGID: 114020] [client.c:2315:notify] 0-grafana-client-2: parent translators are ready, attempting connect on transport [] Final graph: +------------------------------------------------------------------------------+ 1: volume grafana-client-0 2: type protocol/client 3: option ping-timeout 42 4: option remote-host gfs009.prod.com 5: option remote-subvolume /opt/data/grafana 6: option transport-type socket 7: option transport.address-family inet 8: option transport.socket.ssl-enabled off 9: option event-threads 4 10: option transport.tcp-user-timeout 0 11: option transport.socket.keepalive-time 20 12: option transport.socket.keepalive-interval 2 13: option transport.socket.keepalive-count 9 14: option strict-locks off 15: option send-gids true 16: end-volume 17: 18: volume grafana-client-1 19: type protocol/client 20: option ping-timeout 42 21: option remote-host gfs007.prod.com 22: option remote-subvolume /opt/data/grafana 23: option transport-type socket 24: option transport.address-family inet 25: option transport.socket.ssl-enabled off 26: option event-threads 4 27: option transport.tcp-user-timeout 0 28: option transport.socket.keepalive-time 20 29: option transport.socket.keepalive-interval 2 30: option transport.socket.keepalive-count 9 31: option strict-locks off 32: option send-gids true 33: end-volume 34: 35: volume grafana-client-2 36: type protocol/client 37: option ping-timeout 42 38: option remote-host gfs008.prod.com 39: option remote-subvolume /opt/data/grafana 40: option transport-type socket 41: option transport.address-family inet 42: option transport.socket.ssl-enabled off 43: option event-threads 4 44: option transport.tcp-user-timeout 0 45: option transport.socket.keepalive-time 20 46: option transport.socket.keepalive-interval 2 47: option transport.socket.keepalive-count 9 48: option strict-locks off 49: option send-gids true 50: end-volume 51: 52: volume grafana-replicate-0 53: type cluster/replicate 54: option afr-pending-xattr grafana-client-0,grafana-client-1,grafana-client-2 55: option favorite-child-policy majority 56: option use-compound-fops off 57: subvolumes grafana-client-0 grafana-client-1 grafana-client-2 58: end-volume 59: 60: volume grafana-dht 61: type cluster/distribute 62: option lookup-optimize on 63: option readdir-optimize on 64: option lock-migration off 65: option force-migration off 66: subvolumes grafana-replicate-0 67: end-volume 68: 69: volume grafana-utime 70: type features/utime 71: option noatime on 72: subvolumes grafana-dht 73: end-volume 74: 75: volume grafana-write-behind 76: type performance/write-behind 77: option flush-behind on 78: option cache-size 16MB 79: subvolumes grafana-utime 80: end-volume 81: 82: volume grafana-readdir-ahead 83: type performance/readdir-ahead 84: option parallel-readdir off 85: option rda-request-size 131072 86: option rda-cache-limit 10MB 87: subvolumes grafana-write-behind 88: end-volume 89: 90: volume grafana-io-cache 91: type performance/io-cache 92: option cache-size 2GB 93: subvolumes grafana-readdir-ahead 94: end-volume 95: 96: volume grafana-open-behind 97: type performance/open-behind 98: subvolumes grafana-io-cache 99: end-volume 100: 101: volume grafana-quick-read 102: type performance/quick-read 103: option cache-size 2GB 104: subvolumes grafana-open-behind 105: end-volume 106: 107: volume grafana-md-cache 108: type performance/md-cache 109: subvolumes grafana-quick-read 110: end-volume 111: 112: volume grafana 113: type debug/io-stats 114: option log-level INFO 115: option threads 16 116: option latency-measurement off 117: option count-fop-hits off 118: option global-threading off 119: subvolumes grafana-md-cache 120: end-volume 121: 122: volume meta-autoload 123: type meta 124: subvolumes grafana 125: end-volume 126: +------------------------------------------------------------------------------+ [2021-08-19 16:48:38.871921] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-0: changing port to 49152 (from 0) [2021-08-19 16:48:38.872086] I [socket.c:849:socket_shutdown] 0-grafana-client-0: intentional socket shutdown(13) [2021-08-19 16:48:38.877519] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-1: changing port to 49152 (from 0) [2021-08-19 16:48:38.877570] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-0: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] [2021-08-19 16:48:38.877786] I [socket.c:849:socket_shutdown] 0-grafana-client-1: intentional socket shutdown(14) [2021-08-19 16:48:38.880255] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-0: Connected, attached to remote volume [{conn-name=grafana-client-0}, {remote_subvol=/opt/data/grafana}] [2021-08-19 16:48:38.880334] I [MSGID: 108005] [afr-common.c:5993:afr_handle_child_up_event] 0-grafana-replicate-0: Subvolume 'grafana-client-0' came back up; going online. [2021-08-19 16:48:38.883553] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-1: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] [2021-08-19 16:48:38.885425] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-1: Connected, attached to remote volume [{conn-name=grafana-client-1}, {remote_subvol=/opt/data/grafana}] [2021-08-19 16:48:38.885520] I [MSGID: 108002] [afr-common.c:6363:afr_notify] 0-grafana-replicate-0: Client-quorum is met [2021-08-19 16:48:38.886059] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 0-grafana-client-2: changing port to 49152 (from 0) [2021-08-19 16:48:38.886218] I [socket.c:849:socket_shutdown] 0-grafana-client-2: intentional socket shutdown(15) [2021-08-19 16:48:38.891974] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-grafana-client-2: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] [2021-08-19 16:48:38.901301] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-grafana-client-2: Connected, attached to remote volume [{conn-name=grafana-client-2}, {remote_subvol=/opt/data/grafana}] [2021-08-19 16:48:38.904360] I [fuse-bridge.c:5300:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.33 [2021-08-19 16:48:38.904408] I [fuse-bridge.c:5928:fuse_graph_sync] 0-fuse: switched to graph 0 [2021-08-19 16:51:04.399387] W [fuse-bridge.c:1387:fuse_attr_cbk] 0-glusterfs-fuse: 7681: STAT() /grafana.db-journal => -1 (Stale file handle) [2021-08-19 16:51:04.400295] W [MSGID: 101159] [inode.c:1246:inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfiddb027943-9da2-4215-a366-e604a72e3010}] [2021-08-19 16:51:08.328621] W [MSGID: 101159] [inode.c:1246:inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfid720826ed-36bb-4e88-9bdb-0dc86157c882}] [2021-08-19 16:52:30.905560] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-grafana-client-0: remote operation failed. [{path=/grafana.db-journal}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=61}, {error=No data available}] [2021-08-19 16:52:31.006928] W [fuse-bridge.c:1387:fuse_attr_cbk] 0-glusterfs-fuse: 16064: STAT() /grafana.db-journal => -1 (Stale file handle) [2021-08-19 16:52:33.328302] W [MSGID: 101159] [inode.c:1246:inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfideab087b3-0ca3-4064-829f-30cb29796e8d}] [2021-08-19 16:56:06.337099] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidf913d531-837b-4f12-972a-cc7f2a9833f3}] [2021-08-19 16:59:13.429520] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-grafana-client-0: remote operation failed. [{path=/grafana.db-journal}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=61}, {error=No data available}] [2021-08-19 16:59:15.333468] W [MSGID: 101159] [inode.c:1246:inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfid3530b62f-26bd-4328-ba52-82eeaf5e2838}] [2021-08-19 16:59:20.328369] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidcae336ae-ee34-4d89-92b1-821a52564895}] [2021-08-19 17:03:26.333398] W [MSGID: 101159] [inode.c:1246:__inode_unlink] 0-inode: dentry not found [{parent-gfid=00000000-0000-0000-0000-000000000001 name=grafana.db-journal gfidba24faf1-d7c6-4f19-8b71-3d3eecd589d5}] [2021-08-19 17:21:09.787909] E [MSGID: 126004] [open-behind.c:587:ob_writev] 0-grafana-open-behind: Failed to submit fop [{fop=writev}, {errno=116}, {error=Stale file handle}] [2021-08-19 17:21:09.788000] W [fuse-bridge.c:3049:fuse_writev_cbk] 0-glusterfs-fuse: 84876: WRITE => -1 gfid=628bfe1d-e406-4bf5-a31d-4bcc61693e09 fd=0x7f07100d4248 (Stale file handle) [2021-08-19 17:21:09.788168] E [MSGID: 126004] [open-behind.c:618:ob_flush] 0-grafana-open-behind: Failed to submit fop [{fop=flush}, {errno=116}, {error=Stale file handle}] [2021-08-19 17:21:09.788204] W [fuse-bridge.c:1952:fuse_err_cbk] 0-glusterfs-fuse: 84877: FLUSH() ERR => -1 (Stale file handle) [2021-08-19 17:21:09.792144] W [fuse-bridge.c:2036:fuse_unlink_cbk] 0-glusterfs-fuse: 84878: UNLINK() /grafana.db-journal => -1 (No such file or directory) pending frames: frame : type(1) op(LK) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2021-08-19 17:21:09 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 8.5 /lib64/libglusterfs.so.0(+0x27e3f)[0x7f073b57ce3f] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f073b587ce4] /lib64/libc.so.6(+0x36400)[0x7f0739bc6400] /usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x3b41b)[0x7f072c7e041b] /usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x3c790)[0x7f072c7e1790] /usr/lib64/glusterfs/8.5/xlator/protocol/client.so(+0x587fd)[0x7f072c7fd7fd] /lib64/libgfrpc.so.0(+0xf101)[0x7f073b328101] /lib64/libgfrpc.so.0(+0xf467)[0x7f073b328467] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f073b324a93] /usr/lib64/glusterfs/8.5/rpc-transport/socket.so(+0x4278)[0x7f072f6f4278] /usr/lib64/glusterfs/8.5/rpc-transport/socket.so(+0x9a91)[0x7f072f6f9a91] /lib64/libglusterfs.so.0(+0x8c91c)[0x7f073b5e191c] /lib64/libpthread.so.0(+0x7ea5)[0x7f073a3c8ea5] /lib64/libc.so.6(clone+0x6d)[0x7f0739c8e9fd]

Expected results: Fuse mount to be up & running

Mandatory info: - The output of the gluster volume info command: Volume Name: grafana Type: Replicate Volume ID: 004549c4-5299-471c-b2f4-339f3437f844 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gfs009.prod.com:/opt/data/grafana Brick2: gfs007.prod.com:/opt/data/grafana Brick3: gfs008.prod.com:/opt/data/grafana Options Reconfigured: server.event-threads: 4 performance.write-behind-window-size: 16MB performance.readdir-ahead: enable performance.read-ahead: disable performance.io-thread-count: 32 performance.io-cache: on performance.flush-behind: on performance.client-io-threads: off performance.cache-size: 2GB cluster.lookup-optimize: on client.event-threads: 4 cluster.readdir-optimize: on transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on cluster.favorite-child-policy: majority - The output of the gluster volume status command: Status of volume: grafana Gluster process TCP Port RDMA Port Online Pid

Brick gfs009.prod.com:/opt/data/grafana 49152 0 Y 9013 Brick gfs007.prod.com:/opt/data/grafana 49152 0 Y 8888 Brick gfs008.prod.com:/opt/data/grafana 49152 0 Y 8958 Self-heal Daemon on localhost N/A N/A Y 8896 Self-heal Daemon on gfs009.prod.com N/A N/A Y 9024 Self-heal Daemon on gfs008.prod.com N/A N/A Y 8969 Task Status of Volume grafana

There are no active volume tasks

- The output of the gluster volume heal command: Brick gfs009.prod.com:/opt/data/grafana Status: Connected Number of entries: 0

Brick gfs007.prod.com:/opt/data/grafana Status: Connected Number of entries: 0

Brick gfs008.prod.com:/opt/data/grafana Status: Connected Number of entries: 0

Additional info:

- The operating system / glusterfs version: centos79 / glusterfs 8.5

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gluster/glusterfs/issues/2722, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDQT2PBXD5IUAJ6IEDEAILT5VDDFANCNFSM5COYE2XQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

pranithk commented 3 years ago

Description of problem: We have a glusterfs volume used to store some grafana data. There is a sqlite database stored in the volume.

After upgrading the fuse driver to 8.5, the client mount started to crash constantly (using an older version of the fuse driver is working correctly 7.7). I was not able to find any errors on the servers side at the time of the client crash.

The exact command to reproduce the issue: The mount is being handled by an entry in /etc/fstab gfs.prod.com:/grafana /opt/grafana glusterfs defaults 0 0

- The operating system / glusterfs version: centos79 / glusterfs 8.5

Could you post output of thread apply all bt of the core file?

MarvinTO commented 3 years ago

Hello @pranithk , I was trying to create the backtrace file, but I got a "Missing separate debuginfos" message. Do I need to install a specific package in order to get this working?

` Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-8.5-1.el7.x86_64

debuginfo-install glusterfs-fuse-8.5-1.el7.x86_64 Loaded plugins: fastestmirror, remove-with-leaves Loading mirror speeds from cached hostfile Could not find debuginfo for main pkg: glusterfs-fuse-8.5-1.el7.x86_64 Could not find debuginfo pkg for dependency package glibc-2.17-324.el7_9.x86_64 Could not find debuginfo pkg for dependency package 1:openssl-libs-1.0.2k-21.el7_9.x86_64 Could not find debuginfo pkg for dependency package libgfrpc0-8.5-1.el7.x86_64 Could not find debuginfo pkg for dependency package libgfxdr0-8.5-1.el7.x86_64 Could not find debuginfo pkg for dependency package libglusterfs0-8.5-1.el7.x86_64 Could not find debuginfo pkg for dependency package libuuid-2.23.2-65.el7_9.1.x86_64 Could not find debuginfo pkg for dependency package zlib-1.2.7-19.el7_9.x86_64 No debuginfo packages available to install `

pranithk commented 3 years ago

@tshacked I see that the release notes is done by you for 8.5, could you help find the rpms?

@MarvinTO It looks similar to https://github.com/gluster/glusterfs/issues/2443 which is fixed in 8.6 release.

pranithk commented 3 years ago

@tshacked I see that the release notes is done by you for 8.5, could you help find the rpms?

@MarvinTO It looks similar to #2443 which is fixed in 8.6 release.

Need the backtrace to confirm

tshacked commented 3 years ago

@tshacked I see that the release notes is done by you for 8.5, could you help find the rpms?

I can see that glusterfs-fuse-debuginfo-8.5-1 was generated for el8: https://cbs.centos.org/koji/buildinfo?buildID=32879 But not for el7 (it is also missing in prev el7 vers): https://cbs.centos.org/koji/buildinfo?buildID=32885 I didn't generate the actual build.
@kalebskeithley, @Shwetha-Acharya, Looks that debuginfo RPMs are not created on centos-7 builds. could you please check it?

Shwetha-Acharya commented 3 years ago

@tshacked I see that the release notes is done by you for 8.5, could you help find the rpms?

I can see that glusterfs-fuse-debuginfo-8.5-1 was generated for el8: https://cbs.centos.org/koji/buildinfo?buildID=32879 But not for el7 (it is also missing in prev el7 vers): https://cbs.centos.org/koji/buildinfo?buildID=32885 I didn't generate the actual build. @kalebskeithley, @Shwetha-Acharya, Looks that debuginfo RPMs are not created on centos-7 builds. could you please check it? @nixpanic

nixpanic commented 3 years ago

The packages are there? Search for glusterfs-debuginfo-8.5-1.el7 on the page, el7 does not split the debuginfo per sub-package like el8 does.

MarvinTO commented 3 years ago

@nixpanic I found glusterfs-debuginfo-8.5-1.el7 in http://debuginfo.centos.org/centos/7/storage/x86_64/

Now I'm wondering if I need glusterfs-debuginfo-8.5-1.el7 or glusterfs-fuse-debuginfo-8.5-1.el7 in order to generate the backtrace.

I could not find glusterfs-fuse-debuginfo-8.5-1.el7 it in the repo and also could not find it in https://cbs.centos.org/koji/buildinfo?buildID=32885

nixpanic commented 3 years ago

On Wed, Aug 25, 2021 at 03:58:55PM -0700, MarvinTO wrote:

@nixpanic I found glusterfs-debuginfo-8.5-1.el7 in http://debuginfo.centos.org/centos/7/storage/x86_64/

Now I'm wondering if I need glusterfs-debuginfo-8.5-1.el7 or glusterfs-fuse-debuginfo-8.5-1.el7 in order to generate the backtrace.

I could not find glusterfs-fuse-debuginfo-8.5-1.el7 it in the repo and also could not find it in https://cbs.centos.org/koji/buildinfo?buildID=32885

el7 generates a single debuginfo package per source RPM, glusterfs-debuginfo should contain everything.

el8 and newer generates a debuginfo package per sub-package, so there you can find glusterfs-fuse-debuginfo.

The CentOS Storage SIG repositories should support installing required debuginfo packages like this:

# debuginfo-install /usr/bin/glusterfs-fuse

There should not be a need to go searching for the RPMs (and its dependencies) yourself.

webash commented 2 years ago

@MarvinTO did you get a resolution to this issue? I'm experiencing issues with sqlite crashing the fuse mount on gluster 9.5

MarvinTO commented 2 years ago

sorry for the late response, no I didn't get a resolution to this issue. However, it was because I didn't follow up this thread.

The clients for this volume decided to replace this volume for a Google Regional Disk, so I was not able to reproduce this issue again (since the environment was changed).

stale[bot] commented 2 years ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

webash commented 2 years ago

Not stale. Don't close.

bambigoreng commented 1 year ago

Same problem with glusterfs-client 9.2-1 (arm64 running on raspberry pi 4).

I put a vaultwarden data directory onto my gv1 volume which gets mounted in a docker container. The data directory contains a db.sqlite3 file. While the gv1 volume was running very stable before, I am experiencing frequent crashes recently. The volume is shared between multiple docker containers, none of them is usable afterwards.

Output from systemd on client-side:

● retry-glusterfs-vol1-mount.service - Retry glusterfs mount
     Loaded: loaded (/etc/systemd/system/retry-glusterfs-vol1-mount.service; enabled; vendor preset: enabled)
     Active: active (exited) since Thu 2023-01-19 01:24:08 CET; 18h ago
    Process: 1690 ExecStart=/usr/bin/mount -t glusterfs pi2.home:/gv1 /srv/glusterfs/vol1 (code=exited, status=0/S>
   Main PID: 1690 (code=exited, status=0/SUCCESS)
      Tasks: 0 (limit: 4164)
        CPU: 36min 56.656s
     CGroup: /system.slice/retry-glusterfs-vol1-mount.service

Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: dlfcn 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: libpthread 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: llistxattr 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: setfsid 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: spinlock 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: epoll.h 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: xattr.h 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: st_atim.tv_nsec 1
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: package-string: glusterfs 9.2
Jan 19 19:07:16 pi1 srv-glusterfs-vol1[1814]: ---------
stale[bot] commented 1 year ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.