gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.53k stars 1.07k forks source link

GlusterFS Critical Error: Negative fsal_fd_global_counter Freezes NFS Shares #4268

Open JannisDev opened 6 months ago

JannisDev commented 6 months ago

Description of problem: Encountering recurring NFS share freezing due to the error message fsal_fd_global_counter is negative, leading to prolonged production downtime.

Mandatory info: - The output of the gluster volume info command:

Volume Name: swarm-utils
Type: Replicate
Volume ID: 59294449-a062-4804-b530-87cc3c7bb378
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: swarm-fs02.srv.dhw.de:/swarm/utils
Brick2: swarm-fs03.srv.dhw.de:/swarm/utils
Brick3: swarm-fs01.srv.dhw.de:/swarm/utils
Options Reconfigured:
storage.owner-uid: 65534
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off

Volume Name: swarm-volumes
Type: Replicate
Volume ID: 3671e28c-0cde-44ed-86f0-f41ffa793051
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: swarm-fs02.srv.dhw.de:/swarm/volumes
Brick2: swarm-fs03.srv.dhw.de:/swarm/volumes
Brick3: swarm-fs01.srv.dhw.de:/swarm/volumes
Options Reconfigured:
storage.owner-uid: 65534
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off

- The output of the gluster volume status command:

Status of volume: swarm-utils
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick swarm-fs02.srv.dhw.de:/swarm/utils    58650     0          Y       733
Brick swarm-fs03.srv.dhw.de:/swarm/utils    50427     0          Y       733
Brick swarm-fs01.srv.dhw.de:/swarm/utils    55932     0          Y       792
Self-heal Daemon on localhost               N/A       N/A        Y       826
Self-heal Daemon on 10.15.29.13             N/A       N/A        Y       803
Self-heal Daemon on 10.15.29.12             N/A       N/A        Y       5987

Task Status of Volume swarm-utils
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: swarm-volumes
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick swarm-fs02.srv.dhw.de:/swarm/volumes  52760     0          Y       766
Brick swarm-fs03.srv.dhw.de:/swarm/volumes  52048     0          Y       787
Brick swarm-fs01.srv.dhw.de:/swarm/volumes  53120     0          Y       807
Self-heal Daemon on localhost               N/A       N/A        Y       826
Self-heal Daemon on 10.15.29.13             N/A       N/A        Y       803
Self-heal Daemon on 10.15.29.12             N/A       N/A        Y       5987

Task Status of Volume swarm-volumes
------------------------------------------------------------------------------
There are no active volume tasks

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/glusterd.log

The message "I [MSGID: 106496] [glusterd-handshake.c:954:__server_getspec] 0-management: Received mount request for volume shd/swarm-utils" repeated 5 times between [2023-11-21 23:00:26.893764 +0000] and [2023-11-21 23:00:27.104694 +0000]
The message "I [MSGID: 106496] [glusterd-handshake.c:954:__server_getspec] 0-management: Received mount request for volume shd/swarm-volumes" repeated 5 times between [2023-11-21 23:00:26.894494 +0000] and [2023-11-21 23:00:27.104767 +0000]
[2023-11-22 07:00:27.057181 +0000] I [MSGID: 106061] [glusterd-utils.c:10724:glusterd_volume_status_copy_to_op_ctx_dict] 0-management: Dict get failed [{Key=count}]
[2023-11-22 07:00:27.057639 +0000] I [MSGID: 106499] [glusterd-handler.c:4372:__glusterd_handle_status_volume] 0-management: Received status volume req for volume swarm-utils
[2023-11-22 07:00:27.060479 +0000] I [MSGID: 106499] [glusterd-handler.c:4372:__glusterd_handle_status_volume] 0-management: Received status volume req for volume swarm-volumes

**- Is there any crash ? Provide the backtrace and coredump /var/log/ganesha/ganesha.log

21/11/2023 11:21:47 : epoch 655c845f : swarm-fs01 : ganesha.nfsd-685[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0)
21/11/2023 11:21:47 : epoch 655c845f : swarm-fs01 : ganesha.nfsd-685[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
21/11/2023 16:03:34 : epoch 655c845f : swarm-fs01 : ganesha.nfsd-685[svc_391] remove_fd_lru :FSAL :CRIT :fsal_fd_global_counter is negative: -1
21/11/2023 16:03:34 : epoch 655c845f : swarm-fs01 : ganesha.nfsd-685[svc_391] gsh_backtrace :NFS STARTUP :MAJ :stack backtrace follows:
/usr/lib/x86_64-linux-gnu/libganesha_nfsd.so.5.7(+0x845d8)[0x7f258f8df5d8]
/usr/lib/x86_64-linux-gnu/libganesha_nfsd.so.5.7(+0x5e34c)[0x7f258f8b934c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f258f84c420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f258f68900b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f258f668859]
/usr/lib/x86_64-linux-gnu/libganesha_nfsd.so.5.7(close_fsal_fd+0x53a)[0x7f258f88776a]
/usr/lib/x86_64-linux-gnu/ganesha/libfsalgluster.so(+0x8176)[0x7f258cecc176]
/usr/lib/x86_64-linux-gnu/libganesha_nfsd.so.5.7(+0x134618)[0x7f258f98f618]
/usr/lib/x86_64-linux-gnu/libganesha_nfsd.so.5.7(fsal_remove+0xff)[0x7f258f895a9f]
/usr/lib/x86_64-linux-gnu/libganesha_nfsd.so.5.7(+0x11c9ea)[0x7f258f9779ea]
/usr/lib/x86_64-linux-gnu/libganesha_nfsd.so.5.7(+0x59845)[0x7f258f8b4845]
/lib/x86_64-linux-gnu/libntirpc.so.5.0(+0x23c7f)[0x7f258f620c7f]
/lib/x86_64-linux-gnu/libntirpc.so.5.0(+0x26850)[0x7f258f623850]
/lib/x86_64-linux-gnu/libntirpc.so.5.0(+0x2150a)[0x7f258f61e50a]
/lib/x86_64-linux-gnu/libntirpc.so.5.0(+0x21f16)[0x7f258f61ef16]
/lib/x86_64-linux-gnu/libntirpc.so.5.0(+0x2d850)[0x7f258f62a850]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f258f840609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f258f765133]

Additional info:

- The operating system / glusterfs version: Ubuntu 20.04 LTS Gluster 10.5 Ganesha 5.7