Closed ESNewmanium closed 3 years ago
@ESNewmanium Generally the crashes leave core files. Will it be possible to install debuginfo rpms on one of the clients and attach the core to gdb and post thread apply all bt
output? It is probably even better if we can get hands on the core file.
@pranithk, appreciate the response. Yes, I can provide the thread apply all bt
but not sure if I can provide the core itself. Is the output below at all helpful?
Thread 23 (Thread 0x7f681bbe34c0 (LWP 20846)):
#0 0x00007f681a525017 in pthread_join (threadid=140084883552000, thread_return=thread_return@entry=0x0) at pthread_join.c:90
#1 0x00007f681b7465b8 in event_dispatch_epoll (event_pool=0x55bfc9047510) at event-epoll.c:848
#2 0x000055bfc828da01 in main (argc=6, argv=<optimized out>) at glusterfsd.c:2917
Thread 22 (Thread 0x7f681126a700 (LWP 20850)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f681b721a90 in syncenv_task (proc=proc@entry=0x55bfc9061f70) at syncop.c:524
#2 0x00007f681b722940 in syncenv_processor (thdata=0x55bfc9061f70) at syncop.c:591
#3 0x00007f681a523ea5 in start_thread (arg=0x7f681126a700) at pthread_create.c:307
#4 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 21 (Thread 0x7f680db8f700 (LWP 20853)):
#0 0x00007f6819de9f43 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f681b747110 in event_dispatch_epoll_worker (data=0x55bfc90a9490) at event-epoll.c:753
#2 0x00007f681a523ea5 in start_thread (arg=0x7f680db8f700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 20 (Thread 0x7f6810268700 (LWP 20852)):
#0 0x00007f6819de09a3 in select () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f681b761914 in runner (arg=0x55bfc90660f0) at ../../contrib/timer-wheel/timer-wheel.c:186
#2 0x00007f681a523ea5 in start_thread (arg=0x7f6810268700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 19 (Thread 0x7f6810a69700 (LWP 20851)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f681b721a90 in syncenv_task (proc=proc@entry=0x55bfc9062330) at syncop.c:524
#2 0x00007f681b722940 in syncenv_processor (thdata=0x55bfc9062330) at syncop.c:591
#3 0x00007f681a523ea5 in start_thread (arg=0x7f6810a69700) at pthread_create.c:307
#4 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 18 (Thread 0x7f680d38e700 (LWP 20854)):
#0 0x00007f6819de9f43 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f681b747110 in event_dispatch_epoll_worker (data=0x55bfc90a9770) at event-epoll.c:753
#2 0x00007f681a523ea5 in start_thread (arg=0x7f680d38e700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 17 (Thread 0x7f6805c2a700 (LWP 20857)):
#0 0x00007f6819de9f43 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f681b747110 in event_dispatch_epoll_worker (data=0x7f680804a010) at event-epoll.c:753
#2 0x00007f681a523ea5 in start_thread (arg=0x7f6805c2a700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 16 (Thread 0x7f681ba80700 (LWP 20855)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f6807179d5d in iot_worker (data=0x7f680802c830) at io-threads.c:197
#2 0x00007f681a523ea5 in start_thread (arg=0x7f681ba80700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 15 (Thread 0x7f681226c700 (LWP 20848)):
#0 0x00007f681a52b3c1 in do_sigwait (sig=0x7f681226b0dc, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:60
#1 __sigwait (set=set@entry=0x7f681226b0e0, sig=sig@entry=0x7f681226b0dc) at ../sysdeps/unix/sysv/linux/sigwait.c:95
#2 0x000055bfc82915db in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2414
#3 0x00007f681a523ea5 in start_thread (arg=0x7f681226c700) at pthread_create.c:307
#4 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 14 (Thread 0x7f680642b700 (LWP 20856)):
#0 0x00007f6819de9f43 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f681b747110 in event_dispatch_epoll_worker (data=0x7f6808049d30) at event-epoll.c:753
#2 0x00007f681a523ea5 in start_thread (arg=0x7f680642b700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 13 (Thread 0x7f67f41bd700 (LWP 25053)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f6807179d5d in iot_worker (data=0x7f680802c830) at io-threads.c:197
#2 0x00007f681a523ea5 in start_thread (arg=0x7f67f41bd700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 12 (Thread 0x7f6811a6b700 (LWP 20849)):
#0 0x00007f6819db085d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f6819db06f4 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2 0x00007f681b70c273 in pool_sweeper (arg=<optimized out>) at mem-pool.c:444
#3 0x00007f681a523ea5 in start_thread (arg=0x7f6811a6b700) at pthread_create.c:307
#4 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 11 (Thread 0x7f6812a6d700 (LWP 20847)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f681b6efa19 in gf_timer_proc (data=0x55bfc90616a0) at timer.c:141
#2 0x00007f681a523ea5 in start_thread (arg=0x7f6812a6d700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 10 (Thread 0x7f67f41fe700 (LWP 25052)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f6807179d5d in iot_worker (data=0x7f680802c830) at io-threads.c:197
#2 0x00007f681a523ea5 in start_thread (arg=0x7f67f41fe700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 9 (Thread 0x7f67f417c700 (LWP 25054)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f6807179d5d in iot_worker (data=0x7f680802c830) at io-threads.c:197
#2 0x00007f681a523ea5 in start_thread (arg=0x7f67f417c700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 8 (Thread 0x7f6804b27700 (LWP 20861)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007f6812a76e3b in timed_response_loop (data=<optimized out>) at fuse-bridge.c:5013
#2 0x00007f681a523ea5 in start_thread (arg=0x7f6804b27700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 7 (Thread 0x7f67f7fff700 (LWP 20862)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007f6812a77d5b in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:4928
#2 0x00007f681a523ea5 in start_thread (arg=0x7f67f7fff700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 6 (Thread 0x7f68040a4700 (LWP 25047)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f6807179d5d in iot_worker (data=0x7f680802c830) at io-threads.c:197
#2 0x00007f681a523ea5 in start_thread (arg=0x7f68040a4700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 5 (Thread 0x7f6804063700 (LWP 25048)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f6807179d5d in iot_worker (data=0x7f680802c830) at io-threads.c:197
#2 0x00007f681a523ea5 in start_thread (arg=0x7f6804063700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 4 (Thread 0x7f68040e5700 (LWP 22683)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f6807179d5d in iot_worker (data=0x7f680802c830) at io-threads.c:197
#2 0x00007f681a523ea5 in start_thread (arg=0x7f68040e5700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 3 (Thread 0x7f680c065700 (LWP 20988)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f6807179d5d in iot_worker (data=0x7f680802c830) at io-threads.c:197
#2 0x00007f681a523ea5 in start_thread (arg=0x7f680c065700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7f6804326700 (LWP 22026)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f6807179d5d in iot_worker (data=0x7f680802c830) at io-threads.c:197
#2 0x00007f681a523ea5 in start_thread (arg=0x7f6804326700) at pthread_create.c:307
#3 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7f6805328700 (LWP 20860)):
#0 __GI___pthread_mutex_lock (mutex=mutex@entry=0x6) at ../nptl/pthread_mutex_lock.c:65
#1 0x00007f681b6f1969 in inode_ref (inode=inode@entry=0x7f67f1ecd268) at inode.c:586
#2 0x00007f6812a73a9f in fuse_ino_to_inode (ino=140084417188456, fuse=<optimized out>) at fuse-helpers.c:386
#3 0x00007f6812a751b5 in fuse_resolve_inode_init (state=state@entry=0x7f67f1c1a2f0, resolve=resolve@entry=0x7f67f1c1a428, ino=<optimized out>)
at fuse-resolve.c:588
#4 0x00007f6812a8c6be in fuse_opendir (this=0x55bfc904fc90, finh=0x7f67f07cbf00, msg=<optimized out>, iobuf=<optimized out>) at fuse-bridge.c:3499
#5 0x00007f6812a76b19 in fuse_dispatch (xl=xl@entry=0x55bfc904fc90, async=<optimized out>) at fuse-bridge.c:5955
#6 0x00007f6812a915ec in gf_async (cbk=0x7f6812a76af0 <fuse_dispatch>, xl=0x55bfc904fc90, async=<optimized out>)
at ../../../../libglusterfs/src/glusterfs/async.h:189
#7 fuse_thread_proc (data=0x55bfc904fc90) at fuse-bridge.c:6176
#8 0x00007f681a523ea5 in start_thread (arg=0x7f6805328700) at pthread_create.c:307
#9 0x00007f6819de996d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Duplicate of #2286
Description of problem: We have around 200 gluster clients using the FUSE driver to mount two gluster bricks each. We've been seeing crashes of random clients once every week or two and frequently this error message on client near the time of the crash:
From another client:
[2021-03-15 23:06:39.551311] E [inode.c:726:inode_forget_atomic] (-->/usr/lib64/glusterfs/7.9/xlator/mount/fuse.so(+0x8d3d) [0x7f6812a76d3d] -->/lib64/libglusterfs.so.0(inode_forget_with_unref+0x29) [0x7f681b6f26b9] -->/lib64/libglusterfs.so.0(+0x3731e) [0x7f681b6f031e] ) 0-: Assertion failed: inode_lookup >= nlookup
The exact command to reproduce the issue: Can't reliably reproduce :(
The full output of the command that failed: Can't reliably reproduce :(
Expected results: Can't reliably reproduce :(
Mandatory info: - The output of the
gluster volume info
command:- The output of the
gluster volume status
command:Task Status of Volume gluster-ssd-volume
There are no active volume tasks
**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/
**- Is there any crash ? Provide the backtrace and coredump One client today exited with a coredump. I unfortunately can't post the dump publicly but I can post the backtrace:
From one of the clients that ended up getting noticed by hung_task_timeout:
Additional info:
- The operating system / glusterfs version: Client OS: CentOS 7.9.2009 Gluster OS: CentOS 7.9.2009
Gluster server version: glusterfs-server-7.9-1.el7.x86_64 Gluster FUSE client version: glusterfs-fuse-7.9-1.el7.x86_64