Unexpected Gluster Client Crash - 6.5 (read-ahead)

dannylee- commented 4 years ago

Description of problem: Looks very similar to https://github.com/gluster/glusterfs/issues/784 and https://github.com/gluster/glusterfs/issues/783, but different stacktrace (read-ahead instead of open-behind)

The exact command to reproduce the issue: Could not reproduce, but there were a lot of files being read before it crashed.

The stacktrace:

[2020-02-27 15:57:41.059088] W [fuse-bridge.c:1506:fuse_fd_cbk] 0-glusterfs-fuse: 1668556410: OPEN() /somelocation/somefile.l.gz => -1 (Stale file handle) pending frames: frame : type(1) op(UNLINK) frame : type(1) op(OPEN) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2020-02-27 15:57:41 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.5 The message "W [MSGID: 114031] [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-company-client-0: remote operation failed" repeated 12333 times between [2020-02-27 15:56:36.703301] and [2020-02-27 15:57:41.721945] The message "E [MSGID: 148002] [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-company-utime: dict set of key for set-ctime-mdata failed" repeated 12333 times between [2020-02-27 15:56:36.703320] and [2020-02-27 15:57:41.721948] pending frames: frame : type(1) op(UNLINK) frame : type(1) op(OPEN) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2020-02-27 15:57:41 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.5 /lib64/libglusterfs.so.0(+0x27130)[0x7f3910c72130] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f3910c7cb34] /lib64/libc.so.6(+0x363b0)[0x7f390f2af3b0] /lib64/libuuid.so.1(+0x25b0)[0x7f39103d65b0] /lib64/libuuid.so.1(+0x2646)[0x7f39103d6646] /lib64/libglusterfs.so.0(uuid_utoa+0x1c)[0x7f3910c7bcac] /usr/lib64/glusterfs/6.5/xlator/performance/io-cache.so(+0x5e55)[0x7f39039cce55] /usr/lib64/glusterfs/6.5/xlator/performance/read-ahead.so(+0x1c16)[0x7f3903df0c16] /usr/lib64/glusterfs/6.5/xlator/features/utime.so(+0x39ab)[0x7f39083149ab] /usr/lib64/glusterfs/6.5/xlator/protocol/client.so(+0x73523)[0x7f390884c523] /lib64/libgfrpc.so.0(+0xf021)[0x7f3910a1c021] /lib64/libgfrpc.so.0(+0xf387)[0x7f3910a1c387] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f3910a189f3] /usr/lib64/glusterfs/6.5/rpc-transport/socket.so(+0xa875)[0x7f390b326875] /lib64/libglusterfs.so.0(+0x8b806)[0x7f3910cd6806] /lib64/libpthread.so.0(+0x7e65)[0x7f390fab1e65] /lib64/libc.so.6(clone+0x6d)[0x7f390f37788d]

Expected results: The client does not crash

Additional info: Before the crash, there were numerous (~4,000) warnings about a "Stale file handle". Something like "W [fuse-bridge.c:1506:fuse_fd_cbk] 0-glusterfs-fuse: 1668523616: OPEN() /somefolder/somefile.l.gz (Stale file handle)". These warning log entries occurred for about 13 minutes right before the crash.

The output of the gluster volume info command:

Volume Name: company Type: Replicate Volume ID: 321e775a-d600-448c-9c0b-ef1a2340d1a9 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.125.10.251:/somelocation Brick2: 10.125.9.13:/somelocation Brick3: 10.125.11.44:/somelocation Options Reconfigured: performance.client-io-threads: off nfs.disable: true transport.address-family: inet performance.io-thread-count: 64 diagnostics.brick-log-level: WARNING storage.fips-mode-rchecksum: on

The operating system / glusterfs version: OS: CentOS 7.7.1908 (Core) GlusterFS Version: 6.5

pasikarkkainen commented 4 years ago

Did you try newer versions of glusterfs? there has been many bugs fixed in newer versions.. so maybe try with 6.8 ?

dannylee- commented 4 years ago

After a few days of load testing to try to figure out a way to reliably reproduce the issue, I was unable to, so I wouldn't be able to confirm if this bug could be fixed with a 6.8 upgrade. Some of the bug fixes that I thought could be related to this issue were related to the rebalancing feature, which we aren't using.

stale[bot] commented 4 years ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] commented 3 years ago

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

gluster / glusterfs

Unexpected Gluster Client Crash - 6.5 (read-ahead) #831