gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.75k stars 1.08k forks source link

Gluster crashing frequently #4418

Open wiza opened 1 month ago

wiza commented 1 month ago

Description of problem: Glusterd crashes frequently

Expected results: To not crash

Mandatory info: - The output of the gluster volume info command: Volume Name: var-data Type: Replicate Volume ID: f4aa5185-e286-4903-a6f1-67458f0c2541 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: XXXX-1:/data/brick/var-data Brick2: XXXX-2:/data/brick/var-data Brick3: XXXXX:/data/brick (arbiter) Options Reconfigured: cluster.favorite-child-policy: size cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.client-io-threads: off cluster.data-self-heal: on cluster.metadata-self-heal: on cluster.entry-self-heal: on cluster.self-heal-daemon: enable cluster.shd-max-threads: 4 disperse.shd-wait-qlength: 2048 cluster.shd-wait-qlength: 2048

- The output of the gluster volume status command:

Status of volume: var-data Gluster process TCP Port RDMA Port Online Pid

Brick XXXX-1:/data/brick/var-data 49788 0 Y 3262015 Brick XXXX-2:/data/brick/var-data 57637 0 Y 2016530 Brick XXXXX:/data/brick 52467 0 Y 5671 Self-heal Daemon on localhost N/A N/A Y 2016548 Self-heal Daemon on XXXX-1 N/A N/A Y 3262033 Self-heal Daemon on XXXXX N/A N/A Y 5242

Task Status of Volume var-data

There are no active volume tasks

- The output of the gluster volume heal command: (it's running so summary) Brick XXXX-1:/data/brick/var-data Status: Connected Total Number of entries: 150 Number of entries in heal pending: 150 Number of entries in split-brain: 0 Number of entries possibly healing: 0

Brick XXXX-2:/data/brick/var-data Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0

Brick XXXXX:/data/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/ lots of logs, will provide if needed

**- Is there any crash ? Provide the backtrace and coredump Oct 01 04:29:57 XXXX-2 systemd-coredump[2003711]: [🡕] Process 682965 (glusterfsd) of user 0 dumped core.

                                               Stack trace of thread 690421:
                                               #0  0x00007eff4763e715 __inode_unref.lto_priv.0 (libglusterfs.so.0 + 0x31715)
                                               #1  0x00007eff4763ed1b __inode_retire.lto_priv.0 (libglusterfs.so.0 + 0x31d1b)
                                               #2  0x00007eff476bbad3 inode_table_prune.isra.0 (libglusterfs.so.0 + 0xaead3)
                                               #3  0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #4  0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #5  0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #6  0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #7  0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #8  0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #9  0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #10 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #11 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #12 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #13 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #14 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #15 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #16 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #17 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #18 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #19 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #20 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #21 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #22 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #23 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #24 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #25 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #26 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #27 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #28 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #29 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #30 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #31 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #32 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #33 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #34 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #35 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #36 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #37 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #38 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #39 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #40 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #41 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #42 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #43 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #44 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #45 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #46 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #47 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #48 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #49 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #50 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #51 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #52 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #53 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #54 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #55 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #56 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #57 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #58 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #59 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #60 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #61 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)
                                               #62 0x00007eff476bb9fa inode_table_prune.isra.0 (libglusterfs.so.0 + 0xae9fa)
                                               #63 0x00007eff4763ed79 inode_unref (libglusterfs.so.0 + 0x31d79)

                                               Stack trace of thread 682969:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff476654f2 syncenv_processor (libglusterfs.so.0 + 0x584f2)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682968:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff476654f2 syncenv_processor (libglusterfs.so.0 + 0x584f2)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682970:
                                               #0  0x00007eff4730422d __select (libc.so.6 + 0x10422d)
                                               #1  0x00007eff47690aee runner (libglusterfs.so.0 + 0x83aee)
                                               #2  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #3  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682966:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff47643eb8 gf_timer_proc (libglusterfs.so.0 + 0x36eb8)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682965:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff4728b6d3 __pthread_clockjoin_ex (libc.so.6 + 0x8b6d3)
                                               #2  0x00007eff47686f97 event_dispatch_epoll.lto_priv.0 (libglusterfs.so.0 + 0x79f97)
                                               #3  0x00007eff4769aa6c gf_io_run.part.0 (libglusterfs.so.0 + 0x8da6c)
                                               #4  0x000055c479db7e08 main (glusterfsd + 0x8e08)
                                               #5  0x00007eff47229590 __libc_start_call_main (libc.so.6 + 0x29590)
                                               #6  0x00007eff47229640 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29640)
                                               #7  0x000055c479db8f55 _start (glusterfsd + 0x9f55)

                                               Stack trace of thread 682967:
                                               #0  0x00007eff4723f3da __sigtimedwait (libc.so.6 + 0x3f3da)
                                               #1  0x00007eff4723ea7c sigwait (libc.so.6 + 0x3ea7c)
                                               #2  0x000055c479dc200b glusterfs_sigwaiter (glusterfsd + 0x1300b)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682973:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff47288fa0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x88fa0)
                                               #2  0x00007eff41a37384 index_worker (index.so + 0x4384)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682977:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff47288fa0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x88fa0)
                                               #2  0x00007eff41bde35b posix_ctx_janitor_thread_proc.lto_priv.0 (posix.so + 0x835b)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682971:
                                               #0  0x00007eff4730e21e epoll_wait (libc.so.6 + 0x10e21e)
                                               #1  0x00007eff476853dc event_dispatch_epoll_worker (libglusterfs.so.0 + 0x783dc)
                                               #2  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #3  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 683470:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff41a8b5a4 iot_worker (io-threads.so + 0x65a4)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682978:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff47288fa0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x88fa0)
                                               #2  0x00007eff41be199b posix_fsyncer (posix.so + 0xb99b)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682975:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff41be274e posix_ctx_disk_thread_proc (posix.so + 0xc74e)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682974:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff41a8b5a4 iot_worker (io-threads.so + 0x65a4)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 2198913:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff41a8b5a4 iot_worker (io-threads.so + 0x65a4)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682998:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff47288fa0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x88fa0)
                                               #2  0x00007eff475ddca8 rpcsvc_request_handler (libgfrpc.so.0 + 0xbca8)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682976:
                                               #0  0x00007eff472d4075 clock_nanosleep@GLIBC_2.2.5 (libc.so.6 + 0xd4075)
                                               #1  0x00007eff472d8c87 __nanosleep (libc.so.6 + 0xd8c87)
                                               #2  0x00007eff472d8bbe sleep (libc.so.6 + 0xd8bbe)
                                               #3  0x00007eff41bde900 posix_health_check_thread_proc (posix.so + 0x8900)
                                               #4  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #5  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682995:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff47288fa0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x88fa0)
                                               #2  0x00007eff475ddca8 rpcsvc_request_handler (libgfrpc.so.0 + 0xbca8)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682997:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff41a8b5a4 iot_worker (io-threads.so + 0x65a4)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 683028:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff41a8b5a4 iot_worker (io-threads.so + 0x65a4)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 682972:
                                               #0  0x00007eff4730e21e epoll_wait (libc.so.6 + 0x10e21e)
                                               #1  0x00007eff476853dc event_dispatch_epoll_worker (libglusterfs.so.0 + 0x783dc)
                                               #2  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #3  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 683029:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff41a8b5a4 iot_worker (io-threads.so + 0x65a4)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 2198912:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff41a8b5a4 iot_worker (io-threads.so + 0x65a4)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)

                                               Stack trace of thread 2198915:
                                               #0  0x00007eff4728679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                               #1  0x00007eff472892a4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x892a4)
                                               #2  0x00007eff41a8b5a4 iot_worker (io-threads.so + 0x65a4)
                                               #3  0x00007eff47289c02 start_thread (libc.so.6 + 0x89c02)
                                               #4  0x00007eff4730ec40 __clone3 (libc.so.6 + 0x10ec40)
                                               ELF object binary architecture: AMD x86-64

Additional info:

- The operating system / glusterfs version: Rocky Linux 9.4 / GlusterFS 11.1

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

mykaul commented 1 month ago

Could it be https://github.com/gluster/glusterfs/pull/4302 ?

wiza commented 1 month ago

That might be, I have no experience on building GlusterFS on RHEL/Rocky 9 but I'll take a look if new release is not coming in the near future and try that patch.