gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.51k stars 1.07k forks source link

Infinite recursion segmentation fault involving `inode_unref()` and `xlators/features/bit-rot/src/stub/bit-rot-stub.c` #4295

Closed Deltik closed 3 months ago

Deltik commented 4 months ago

Bug Description

There is a stack overflow that crashes the GlusterFS brick process, glusterfsd, with SIGSEGV.

pl_readdirp() calls posix_acl_readdirp(), which calls br_stub_readdirp(), which calls posix_readdirp(), which calls posix_do_readdir(), which calls gf_dirent_free(), which calls gf_dirent_entry_free(), which calls inode_unref().

inode_unref() then repeatedly nests calls to itself via inode_table_prune()__inode_destroy()inode_unref() until the program is killed and the brick goes offline.

How to Reproduce

It is not known how to trigger this crash, but the core dump suggests that the bitrot daemon and readdir() are involved. Oddly, the bitrot daemon is not enabled (features.bitrot: off on the volume).

The cmdline of the brick process is as follows:

/usr/sbin/glusterfsd -s files2.0 --volfile-id data.files2.0.mnt-glusterfs -p /var/run/gluster/vols/data/files2.0-mnt-glusterfs.pid -S /var/run/gluster/285a953519cb0d10.socket --brick-name /mnt/glusterfs -l /var/log/glusterfs/bricks/mnt-glusterfs.log --xlator-option *-posix.glusterd-uuid=6ed2c58f-4c22-4dd3-916e-c7a1443a12f4 --process-name brick --brick-port 51702 --xlator-option data-server.listen-port=51702

Failure Output

Segmentation fault (core dumped)

Excerpt from the backtrace at the moment of the crash:

#0  0x00007f2616ded596 in __inode_unref (inode=0x7f25d49baaf8, clear=clear@entry=false) at /glusterfs/libglusterfs/src/inode.c:457
#1  0x00007f2616dedb5c in __dentry_unset (dentry=0x7f25e01ae7d8) at /glusterfs/libglusterfs/src/inode.c:218
#2  __inode_retire (inode=0x7f25e006ce98) at /glusterfs/libglusterfs/src/inode.c:429
#3  0x00007f2616e6bee3 in inode_table_prune.isra.0 (table=0x7f26080058f8) at /glusterfs/libglusterfs/src/inode.c:1594
#4  0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587
#5  0x00007f2616e6be04 in __inode_destroy (inode=0x7f25d0035bf8) at /glusterfs/libglusterfs/src/inode.c:354
#6  inode_table_prune.isra.0 (table=<optimized out>) at /glusterfs/libglusterfs/src/inode.c:1628
#7  0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587
#8  0x00007f2616e6be04 in __inode_destroy (inode=0x7f25d4157f68) at /glusterfs/libglusterfs/src/inode.c:354
#9  inode_table_prune.isra.0 (table=<optimized out>) at /glusterfs/libglusterfs/src/inode.c:1628
#10 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587
#11 0x00007f2616e6be04 in __inode_destroy (inode=0x7f25fc6a5448) at /glusterfs/libglusterfs/src/inode.c:354
#12 inode_table_prune.isra.0 (table=<optimized out>) at /glusterfs/libglusterfs/src/inode.c:1628
#13 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587
… [TRUNCATED] …
#4705 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587
#4706 0x00007f2616e6be04 in __inode_destroy (inode=0x7f25fc29ab98) at /glusterfs/libglusterfs/src/inode.c:354
#4707 inode_table_prune.isra.0 (table=<optimized out>) at /glusterfs/libglusterfs/src/inode.c:1628
#4708 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587
#4709 0x00007f2616e6be04 in __inode_destroy (inode=0x7f25d070fa38) at /glusterfs/libglusterfs/src/inode.c:354
#4710 inode_table_prune.isra.0 (table=<optimized out>) at /glusterfs/libglusterfs/src/inode.c:1628
#4711 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587
#4712 0x00007f2616e6be04 in __inode_destroy (inode=0x7f25d46c53b8) at /glusterfs/libglusterfs/src/inode.c:354
#4713 inode_table_prune.isra.0 (table=<optimized out>) at /glusterfs/libglusterfs/src/inode.c:1628
#4714 0x00007f2616dedbb6 in inode_unref (inode=0x7f25b81a3428) at /glusterfs/libglusterfs/src/inode.c:587
#4715 0x00007f2616e08fab in gf_dirent_entry_free (entry=0x7f25b811cf78) at /glusterfs/libglusterfs/src/gf-dirent.c:188
#4716 0x00007f2616e09005 in gf_dirent_free (entries=entries@entry=0x7f261015f760) at /glusterfs/libglusterfs/src/gf-dirent.c:208
#4717 0x00007f261088a1a8 in posix_do_readdir.isra.0 (frame=<optimized out>, this=<optimized out>, fd=<optimized out>, size=<optimized out>, off=<optimized out>, whichop=<optimized out>, dict=<optimized out>) at /glusterfs/xlators/storage/posix/src/posix-inode-fd-ops.c:5988
#4718 0x00007f2610880396 in posix_readdirp (frame=frame@entry=0x7f25b8157ba8, this=this@entry=0x7f2604018598, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, dict=dict@entry=0x7f25f803ef98)
    at /glusterfs/xlators/storage/posix/src/posix-inode-fd-ops.c:6029
#4719 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b8157ba8, this=this@entry=0x7f260401c058, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965
#4720 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b8157ba8, this=<optimized out>, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965
#4721 0x00007f26107c4db1 in br_stub_readdirp (frame=frame@entry=0x7f25b8113438, this=0x7f260401f738, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, offset=offset@entry=3361271667900255181, dict=dict@entry=0x7f25f803ef98)
    at /glusterfs/xlators/features/bit-rot/src/stub/bit-rot-stub.c:2854
#4722 0x00007f26107ae56c in posix_acl_readdirp (frame=frame@entry=0x7f25b81586a8, this=0x7f2604021218, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, offset=offset@entry=3361271667900255181, dict=<optimized out>, dict@entry=0x7f25f803ef98)
    at /glusterfs/xlators/system/posix-acl/src/posix-acl.c:1646
#4723 0x00007f2610773082 in pl_readdirp (frame=frame@entry=0x7f25b80f3048, this=this@entry=0x7f2604022b88, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, offset=offset@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98)
    at /glusterfs/xlators/features/locks/src/posix.c:3103
#4724 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b80f3048, this=this@entry=0x7f26040245a8, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965
#4725 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b80f3048, this=this@entry=0x7f2604026128, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965
#4726 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b80f3048, this=this@entry=0x7f2604027cb8, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965
#4727 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b80f3048, this=<optimized out>, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965
#4728 0x00007f2610710c56 in up_readdirp (frame=frame@entry=0x7f25b815dba8, this=0x7f260402b068, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, dict=dict@entry=0x7f25f803ef98) at /glusterfs/xlators/features/upcall/src/upcall.c:1298
#4729 0x00007f2616e66f24 in default_readdirp_resume (frame=0x7f25f80db798, this=0x7f260402ca28, fd=0x7f25f8356fb8, size=8192, off=3361271667900255181, xdata=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2168
#4730 0x00007f2616e0090d in call_resume (stub=0x7f25f82f1258) at /glusterfs/libglusterfs/src/call-stub.c:2390
#4731 0x00007f26106f6e18 in iot_worker (data=0x7f260404ee98) at /glusterfs/xlators/performance/io-threads/src/io-threads.c:223
#4732 0x00007f2616c17044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#4733 0x00007f2616c9761c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Expected Behavior

No crash of glusterfsd, especially when the bitrot daemon is disabled

**Mandatory info:** **- The output of the `gluster volume info` command**: ``` root@files2.0:/root# gluster volume info Volume Name: data Type: Distribute Volume ID: 03bd2ace-080c-48d5-8287-80b80e969df8 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: files2.0:/mnt/glusterfs Options Reconfigured: storage.reserve: 10MB storage.build-pgfid: on features.cache-invalidation: on cluster.quorum-type: auto cluster.server-quorum-type: server cluster.favorite-child-policy: majority features.scrub: Inactive features.bitrot: off nfs.addr-namelookup: off nfs.rpc-auth-allow: * storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: off performance.client-io-threads: off ``` **- The output of the `gluster volume status` command**: ``` root@files2.0:/root# gluster volume status Status of volume: data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick files2.0:/mnt/glusterfs 51702 0 Y 23154 NFS Server on localhost 2049 0 Y 23187 Task Status of Volume data ------------------------------------------------------------------------------ There are no active volume tasks ``` **- The output of the `gluster volume heal` command**: ``` root@files2.0:/root# gluster volume heal data Launching heal operation to perform index self heal on volume data has been unsuccessful: Self-heal-daemon is disabled. Heal will not be triggered on volume data root@files2.0:/root# gluster volume heal data info This command is supported for only volumes of replicate/disperse type. Volume data is not of type replicate/disperse Volume heal failed. ``` **- Provide logs present on following locations of client and server nodes - [Archive of `/var/log/glusterfs`](https://github.com/gluster/glusterfs/files/13773917/var-log-glusterfs.tar.gz) **- Is there any crash ? Provide the backtrace and coredump Download the core dump here: * [Uncompressed](https://web.archive.org/web/20231226223237if_/https://keep-sh.nyc3.digitaloceanspaces.com/1/Fh6wtxvzQUU1ZwC4/core-files2.0-glfs_iotwr001-1703171195-169?X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=Q7JO64AWYT56RHKZGOE3%2F20231226%2Fnyc3%2Fs3%2Faws4_request&X-Amz-Date=20231226T223237Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Signature=2a0ba6d749c9879bd4c9038ea8e1419f9a92221bc52ac5342a9495a28d0a009f) * [Compressed with Zstandard](https://web.archive.org/web/20231226223423if_/https://keep-sh.nyc3.digitaloceanspaces.com/1/BnzbV7GFHnPA9fgD/core-files2.0-glfs_iotwr001-1703171195-169.zst?X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=Q7JO64AWYT56RHKZGOE3%2F20231226%2Fnyc3%2Fs3%2Faws4_request&X-Amz-Date=20231226T223423Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Signature=51bbdc6a732bd9d79a4de2afe7df7815982134d2d83c7819f8c947680ffba476) Backtrace (truncated for brevity): ``` #0 0x00007f2616ded596 in __inode_unref (inode=0x7f25d49baaf8, clear=clear@entry=false) at /glusterfs/libglusterfs/src/inode.c:457 #1 0x00007f2616dedb5c in __dentry_unset (dentry=0x7f25e01ae7d8) at /glusterfs/libglusterfs/src/inode.c:218 #2 __inode_retire (inode=0x7f25e006ce98) at /glusterfs/libglusterfs/src/inode.c:429 #3 0x00007f2616e6bee3 in inode_table_prune.isra.0 (table=0x7f26080058f8) at /glusterfs/libglusterfs/src/inode.c:1594 #4 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587 #5 0x00007f2616e6be04 in __inode_destroy (inode=0x7f25d0035bf8) at /glusterfs/libglusterfs/src/inode.c:354 #6 inode_table_prune.isra.0 (table=) at /glusterfs/libglusterfs/src/inode.c:1628 #7 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587 #8 0x00007f2616e6be04 in __inode_destroy (inode=0x7f25d4157f68) at /glusterfs/libglusterfs/src/inode.c:354 #9 inode_table_prune.isra.0 (table=) at /glusterfs/libglusterfs/src/inode.c:1628 #10 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587 #11 0x00007f2616e6be04 in __inode_destroy (inode=0x7f25fc6a5448) at /glusterfs/libglusterfs/src/inode.c:354 #12 inode_table_prune.isra.0 (table=) at /glusterfs/libglusterfs/src/inode.c:1628 #13 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587 … [TRUNCATED] … #4705 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587 #4706 0x00007f2616e6be04 in __inode_destroy (inode=0x7f25fc29ab98) at /glusterfs/libglusterfs/src/inode.c:354 #4707 inode_table_prune.isra.0 (table=) at /glusterfs/libglusterfs/src/inode.c:1628 #4708 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587 #4709 0x00007f2616e6be04 in __inode_destroy (inode=0x7f25d070fa38) at /glusterfs/libglusterfs/src/inode.c:354 #4710 inode_table_prune.isra.0 (table=) at /glusterfs/libglusterfs/src/inode.c:1628 #4711 0x00007f2616dedbb6 in inode_unref (inode=0x7f2608005a08) at /glusterfs/libglusterfs/src/inode.c:587 #4712 0x00007f2616e6be04 in __inode_destroy (inode=0x7f25d46c53b8) at /glusterfs/libglusterfs/src/inode.c:354 #4713 inode_table_prune.isra.0 (table=) at /glusterfs/libglusterfs/src/inode.c:1628 #4714 0x00007f2616dedbb6 in inode_unref (inode=0x7f25b81a3428) at /glusterfs/libglusterfs/src/inode.c:587 #4715 0x00007f2616e08fab in gf_dirent_entry_free (entry=0x7f25b811cf78) at /glusterfs/libglusterfs/src/gf-dirent.c:188 #4716 0x00007f2616e09005 in gf_dirent_free (entries=entries@entry=0x7f261015f760) at /glusterfs/libglusterfs/src/gf-dirent.c:208 #4717 0x00007f261088a1a8 in posix_do_readdir.isra.0 (frame=, this=, fd=, size=, off=, whichop=, dict=) at /glusterfs/xlators/storage/posix/src/posix-inode-fd-ops.c:5988 #4718 0x00007f2610880396 in posix_readdirp (frame=frame@entry=0x7f25b8157ba8, this=this@entry=0x7f2604018598, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, dict=dict@entry=0x7f25f803ef98) at /glusterfs/xlators/storage/posix/src/posix-inode-fd-ops.c:6029 #4719 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b8157ba8, this=this@entry=0x7f260401c058, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965 #4720 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b8157ba8, this=, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965 #4721 0x00007f26107c4db1 in br_stub_readdirp (frame=frame@entry=0x7f25b8113438, this=0x7f260401f738, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, offset=offset@entry=3361271667900255181, dict=dict@entry=0x7f25f803ef98) at /glusterfs/xlators/features/bit-rot/src/stub/bit-rot-stub.c:2854 #4722 0x00007f26107ae56c in posix_acl_readdirp (frame=frame@entry=0x7f25b81586a8, this=0x7f2604021218, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, offset=offset@entry=3361271667900255181, dict=, dict@entry=0x7f25f803ef98) at /glusterfs/xlators/system/posix-acl/src/posix-acl.c:1646 #4723 0x00007f2610773082 in pl_readdirp (frame=frame@entry=0x7f25b80f3048, this=this@entry=0x7f2604022b88, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, offset=offset@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/xlators/features/locks/src/posix.c:3103 #4724 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b80f3048, this=this@entry=0x7f26040245a8, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965 #4725 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b80f3048, this=this@entry=0x7f2604026128, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965 #4726 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b80f3048, this=this@entry=0x7f2604027cb8, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965 #4727 0x00007f2616e63275 in default_readdirp (frame=frame@entry=0x7f25b80f3048, this=, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, xdata=xdata@entry=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2965 #4728 0x00007f2610710c56 in up_readdirp (frame=frame@entry=0x7f25b815dba8, this=0x7f260402b068, fd=fd@entry=0x7f25f8356fb8, size=size@entry=8192, off=off@entry=3361271667900255181, dict=dict@entry=0x7f25f803ef98) at /glusterfs/xlators/features/upcall/src/upcall.c:1298 #4729 0x00007f2616e66f24 in default_readdirp_resume (frame=0x7f25f80db798, this=0x7f260402ca28, fd=0x7f25f8356fb8, size=8192, off=3361271667900255181, xdata=0x7f25f803ef98) at /glusterfs/libglusterfs/src/defaults.c:2168 #4730 0x00007f2616e0090d in call_resume (stub=0x7f25f82f1258) at /glusterfs/libglusterfs/src/call-stub.c:2390 #4731 0x00007f26106f6e18 in iot_worker (data=0x7f260404ee98) at /glusterfs/xlators/performance/io-threads/src/io-threads.c:223 #4732 0x00007f2616c17044 in start_thread (arg=) at ./nptl/pthread_create.c:442 #4733 0x00007f2616c9761c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 ``` **Additional info:** The bitrot daemon was previously enabled but subsequently disabled after we encountered the same crash the first time.

- The operating system / glusterfs version: Debian 12 running GlusterFS 11.1

mohit84 commented 4 months ago

It seems the brick process is getting stack overflow during unref of namespace_inode, the ns_inode was introduced by the patch (https://github.com/gluster/glusterfs/pull/1763/files), @amarts Can you please share your view on the same?

Deltik commented 4 months ago

@mohit84 or @amarts: Is there a workaround to avoid the crash? And do you need any more information from me to debug this one?

The brick keeps crashing and causing shared storage outages. I'm hoping to offer a remedy to the customer before their scheduled go-live next week.

mohit84 commented 4 months ago

Except applying the patch to fix the issue i don;t think there is any other solution/workaround, either you have to revert the patch ((https://github.com/gluster/glusterfs/pull/1763/files) or you have to apply a patch to fix the crash. I will try to share the patch to fix the same , let me know if you are interested to test the same in your environment.

Deltik commented 4 months ago

Thanks for the reply, @mohit84. https://github.com/gluster/glusterfs/pull/1763 doesn't revert cleanly, so I'm willing to test a patch from you to fix the crash.

mohit84 commented 4 months ago

Can you please try to apply below patch in your environment and share the result.


diff --git a/libglusterfs/src/inode.c b/libglusterfs/src/inode.c
index 64ea78c6b2..59d7be9ffe 100644
--- a/libglusterfs/src/inode.c
+++ b/libglusterfs/src/inode.c
@@ -351,7 +351,17 @@ __inode_ctx_free(inode_t *inode)
 static void
 __inode_destroy(inode_t *inode)
 {
-    inode_unref(inode->ns_inode);
+    inode_table_t *table = NULL;
+    inode_t *ns_inode = inode->ns_inode;
+
+    if (ns_inode) {
+        table = ns_inode->table;
+        pthread_mutex_lock(&table->lock);
+        {
+            __inode_unref(ns_inode, false);
+        }
+        pthread_mutex_unlock(&table->lock);
+    }
     __inode_ctx_free(inode);
Deltik commented 4 months ago

Thank you for the patch. It'll take some time to roll out on my end. I'll report back perhaps in a week or two whether the brick seems to be stable once it's been running a while.

mohit84 commented 4 months ago

Thanks for confirmation, i will wait your response.

Deltik commented 3 months ago

@mohit84: The patch appears to stabilize glusterfsd. There have been no reports of the brick crashing since deploying the fix on 19 January 2024, 5 days ago.

mohit84 commented 3 months ago

Thanks for confirming it, let's wait one more week. I will upload a patch in next week.

edrock200 commented 3 months ago

Thanks for confirming it, let's wait one more week. I will upload a patch in next week.

I don't know enough about the underlying code, so pardon my ignorance on this one, but since the code appears to be related to inodes, will this patch resolve the infinite "inode path not completely resolved. Asking for full path" log entries in the brick logs or is this unrelated?

Also when you say upload a patch, I assume that means it has to manually be applied, not added to the repos for updates via apt? Thanks in advance.

mohit84 commented 3 months ago

Thanks for confirming it, let's wait one more week. I will upload a patch in next week.

I don't know enough about the underlying code, so pardon my ignorance on this one, but since the code appears to be related to inodes, will this patch resolve the infinite "inode path not completely resolved. Asking for full path" log entries in the brick logs or is this unrelated?

Also when you say upload a patch, I assume that means it has to manually be applied, not added to the repos for updates via apt? Thanks in advance.

I don't think it should be related to this, the message are throwing by brick only during gfid based lookup. Ideally the message should be DEBUG message but somehow it was implemented as a INFO message. For specific to a patch the patch is already merged in devel branch, the pull request was already generated to backport the same in release-11.

For specific to getting crash can you please share "thread apply all bt full" output after attach a core with gdb, I have asked in past also but i did not get any update so it is difficult to find out RCA.

Deltik commented 3 months ago

For specific to getting crash can you please share "thread apply all bt full" output after attach a core with gdb, I have asked in past also but i did not get any update so it is difficult to find out RCA.

Are you asking me for another backtrace?

mohit84 commented 3 months ago

For specific to getting crash can you please share "thread apply all bt full" output after attach a core with gdb, I have asked in past also but i did not get any update so it is difficult to find out RCA.

Are you asking me for another backtrace?

Not from you i was asking @edrock200 to share the backtrace.

edrock200 commented 3 months ago

Thanks for confirming it, let's wait one more week. I will upload a patch in next week.

I don't know enough about the underlying code, so pardon my ignorance on this one, but since the code appears to be related to inodes, will this patch resolve the infinite "inode path not completely resolved. Asking for full path" log entries in the brick logs or is this unrelated? Also when you say upload a patch, I assume that means it has to manually be applied, not added to the repos for updates via apt? Thanks in advance.

I don't think it should be related to this, the message are throwing by brick only during gfid based lookup. Ideally the message should be DEBUG message but somehow it was implemented as a INFO message. For specific to a patch the patch is already merged in devel branch, the pull request was already generated to backport the same in release-11.

For specific to getting crash can you please share "thread apply all bt full" output after attach a core with gdb, I have asked in past also but i did not get any update so it is difficult to find out RCA.

My apologies @mohit84 . At the time you asked, my knowledge of how to conduct such a task was lacking. Wasn't intentionally ignoring your request. I believe I know how to do this now. That being said, yesterday I turned off nl-cache on said volume, and the errors appear to have disapated. Too soon to tell, will let it burn in for 48h or so. The nl-cache setting seems to also prevent heals from commencing when a brick is replaced fwiw. Also apologies for hijacking the thread. If issue resurfaces I will open a new issue. If not will update here.

nick-oconnor commented 1 week ago

I've filed https://bugs.launchpad.net/ubuntu/+source/glusterfs/+bug/2064843 to try and get this patched as it now affects Ubuntu 24.04. @mohit84 would it be possible to cut a release containing #4302?

mohit84 commented 1 week ago

I've filed https://bugs.launchpad.net/ubuntu/+source/glusterfs/+bug/2064843 to try and get this patched as it now affects Ubuntu 24.04. @mohit84 would it be possible to cut a release containing #4302?

@aravindavk can confirm about the release, now Red Hat is not maintaining the glusterfs so i am not sure about the next release.