gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.75k stars 1.08k forks source link

One brick offline with signal received: 11 #1699

Closed Bockeman closed 4 years ago

Bockeman commented 4 years ago

Description of problem: One brick on one server is offline and all attempts to bring it back online have failed. The corresponding brick on the other (of a replica 2) server is ok. Other bricks are ok.

The following do not clear the problem:

The problem appears to be similar to https://github.com/gluster/glusterfs/issues/1531 but the cause is different, and the number of volumes and bricks is different. (I note the observation comment regarding "replica 2" and split-brain, but the cost (time/effort) to recover from split-brain is manageable and usually due to external causes, such as a power cut.)

My urgent need is to find a way out of the current situation and bring back online brick06 on the second server. Not so urgent is the need for gluster to handle this condition in a graceful way and report to the user/admin what is the real cause of the problem and how to fix it (if it cannot be fixed automatically).

The exact command to reproduce the issue: Not sure what actually caused this situation to arise, but activity at the time was: Multiple clients, all active, but with minimal activity. Intense activity from one client (actually one of the two gluster servers), scripted "chown" on over a million files which had been running for over 5 hours and was 83% complete. edit or "sed -i" on a 500MB script file (but should not have tipped over the 22GB Mem + 8 GB Swap)

The full output of the command that failed:

``` gluster volume status gluvol1 \ 2>&1 | awk '{print " " $0}'; date +\ \ %F\ %T%n Status of volume: gluvol1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick veriicon:/srv/brick06 49162 0 Y 2065834 Brick verijolt:/srv/brick06 N/A N/A N N/A Brick veriicon:/srv/brick05 49163 0 Y 2065859 Brick verijolt:/srv/brick05 49161 0 Y 4775 Brick veriicon:/srv/brick07 49164 0 Y 2065887 Brick verijolt:/srv/brick07 49162 0 Y 4797 Self-heal Daemon on localhost N/A N/A Y 1969 Quota Daemon on localhost N/A N/A Y 4867 Bitrot Daemon on localhost N/A N/A Y 4882 Scrubber Daemon on localhost N/A N/A Y 4938 Self-heal Daemon on veriicon N/A N/A Y 2063499 Quota Daemon on veriicon N/A N/A Y 2304107 Bitrot Daemon on veriicon N/A N/A Y 2304118 Scrubber Daemon on veriicon N/A N/A Y 2304144 Task Status of Volume gluvol1 ------------------------------------------------------------------------------ There are no active volume tasks 2020-10-22 23:35:11 ```

Expected results: Some way to bring that brick back online.

- The output of the gluster volume info command:

``` gluster volume info gluvol1 \ 2>&1 | awk '{print " " $0}'; date +\ \ %F\ %T%n Volume Name: gluvol1 Type: Distributed-Replicate Volume ID: 5af1e1c9-afbd-493d-a567-d7989cf3b9ea Status: Started Snapshot Count: 0 Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: veriicon:/srv/brick06 Brick2: verijolt:/srv/brick06 Brick3: veriicon:/srv/brick05 Brick4: verijolt:/srv/brick05 Brick5: veriicon:/srv/brick07 Brick6: verijolt:/srv/brick07 Options Reconfigured: storage.max-hardlinks: 10000000 features.quota-deem-statfs: on features.scrub-freq: monthly features.inode-quota: on features.quota: on features.scrub: Active features.bitrot: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.self-heal-daemon: on cluster.min-free-inodes: 5% 2020-10-22 23:34:01 ```

- The operating system / glusterfs version: Fedora F32 Linux veriicon 5.8.15-201.fc32.x86_64 #1 SMP Thu Oct 15 15:56:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Linux verijolt 5.8.15-201.fc32.x86_64 #1 SMP Thu Oct 15 15:56:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux glusterfs 7.8

Additional info:

on veriicon (1st server) ``` systemctl status glusterd o glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2020-10-18 11:20:39 BST; 4 days ago Docs: man:glusterd(8) Process: 833 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 838 (glusterd) Tasks: 314 (limit: 23818) Memory: 11.9G CPU: 2d 13h 26min 1.990s CGroup: /system.slice/glusterd.service 838 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO 2063406 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol0.veriicon.srv-brick04 -p /var/run/gluster/vols/gluvol0/veriicon-srv-brick> 2063429 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol0.veriicon.srv-brick00 -p /var/run/gluster/vols/gluvol0/veriicon-srv-brick> 2063452 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol0.veriicon.srv-brick01 -p /var/run/gluster/vols/gluvol0/veriicon-srv-brick> 2063499 /usr/sbin/glusterfs -s localhost --volfile-id shd/gluvol0 -p /var/run/gluster/shd/gluvol0/gluvol0-shd.pid -l /var/log/glusterf> 2065834 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol1.veriicon.srv-brick06 -p /var/run/gluster/vols/gluvol1/veriicon-srv-brick> 2065859 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol1.veriicon.srv-brick05 -p /var/run/gluster/vols/gluvol1/veriicon-srv-brick> 2065887 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol1.veriicon.srv-brick07 -p /var/run/gluster/vols/gluvol1/veriicon-srv-brick> 2304107 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quota> 2304118 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S> 2304144 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/log/glusterfs/scrub.lo> ``` on verijolt (2nd server) ``` systemctl status glusterd o glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2020-10-22 18:00:21 BST; 21min ago Docs: man:glusterd(8) Process: 823 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 847 (glusterd) Tasks: 262 (limit: 26239) Memory: 3.4G CPU: 49min 14.474s CGroup: /system.slice/glusterd.service 847 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO 1701 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/gluvol1 --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *dht.readdir-opti> 1969 /usr/sbin/glusterfs -s localhost --volfile-id shd/gluvol0 -p /var/run/gluster/shd/gluvol0/gluvol0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/361019a1040168a1.socket --xlator-option *replicate*.node-uu> 4575 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol0.verijolt.srv-brick04 -p /var/run/gluster/vols/gluvol0/verijolt-srv-brick04.pid -S /var/run/gluster/106ca420dff7312a.socket --brick-name /srv/brick04 -l /var/log/gluster> 4598 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol0.verijolt.srv-brick00 -p /var/run/gluster/vols/gluvol0/verijolt-srv-brick00.pid -S /var/run/gluster/9978a7a4ac3bc9dd.socket --brick-name /srv/brick00 -l /var/log/gluster> 4621 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol0.verijolt.srv-brick01 -p /var/run/gluster/vols/gluvol0/verijolt-srv-brick01.pid -S /var/run/gluster/c64e4259882d3f4d.socket --brick-name /srv/brick01 -l /var/log/gluster> 4775 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick05 -p /var/run/gluster/vols/gluvol1/verijolt-srv-brick05.pid -S /var/run/gluster/0135293616f7a351.socket --brick-name /srv/brick05 -l /var/log/gluster> 4797 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick07 -p /var/run/gluster/vols/gluvol1/verijolt-srv-brick07.pid -S /var/run/gluster/b7d8c6bd3c8b2992.socket --brick-name /srv/brick07 -l /var/log/gluster> 4867 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/739fa467313dc700.socket --process-name quotad 4882 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S /var/run/gluster/a646064795a52ac2.socket --global-timer-wheel 4938 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/log/glusterfs/scrub.log -S /var/run/gluster/903362075a5b27cc.socket --global-timer-wheel Oct 22 18:04:34 verijolt srv-brick06[4753]: dlfcn 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: libpthread 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: llistxattr 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: setfsid 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: spinlock 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: epoll.h 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: xattr.h 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: st_atim.tv_nsec 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: package-string: glusterfs 7.8 Oct 22 18:04:34 verijolt srv-brick06[4753]: --------- ``` There's a problem with brick06 on server verijolt.

snippet from /var/log/messages

``` Oct 22 18:03:13 verijolt systemd[2266]: Finished Mark boot as successful. Oct 22 18:04:34 verijolt srv-brick06[4753]: pending frames: Oct 22 18:04:34 verijolt audit[4753]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=4753 comm="glfs_iotwr002" exe="/usr/sbin/glusterfsd" sig=11 res=1 Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(0) op(0) Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(0) op(0) Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(GETXATTR) Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(LOOKUP) Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(LOOKUP) Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(LOOKUP) Oct 22 18:04:34 verijolt srv-brick06[4753]: patchset: git://git.gluster.org/glusterfs.git Oct 22 18:04:34 verijolt srv-brick06[4753]: signal received: 11 Oct 22 18:04:34 verijolt srv-brick06[4753]: time of crash: Oct 22 18:04:34 verijolt srv-brick06[4753]: 2020-10-22 17:04:34 Oct 22 18:04:34 verijolt srv-brick06[4753]: configuration details: Oct 22 18:04:34 verijolt srv-brick06[4753]: argp 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: backtrace 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: dlfcn 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: libpthread 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: llistxattr 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: setfsid 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: spinlock 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: epoll.h 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: xattr.h 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: st_atim.tv_nsec 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: package-string: glusterfs 7.8 Oct 22 18:04:34 verijolt srv-brick06[4753]: --------- Oct 22 18:04:34 verijolt systemd[1]: Created slice system-systemd\x2dcoredump.slice. Oct 22 18:04:34 verijolt audit: BPF prog-id=45 op=LOAD Oct 22 18:04:34 verijolt audit: BPF prog-id=46 op=LOAD Oct 22 18:04:34 verijolt audit: BPF prog-id=47 op=LOAD Oct 22 18:04:34 verijolt systemd[1]: Started Process Core Dump (PID 4906/UID 0). Oct 22 18:04:34 verijolt audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4906-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Oct 22 18:04:34 verijolt systemd-coredump[4907]: Process 4753 (glusterfsd) of user 0 dumped core.#012#012Stack trace of thread 4845:#012#0 0x00007f812ba19729 posix_get_ancestry_non_directory (posix.so + 0x31729)#012#1 0x00007f812ba19c9f posix_get_ancestry (posix.so + 0x31c9f)#012#2 0x00007f812ba22c20 posix_readdirp (posix.so + 0x3ac20)#012#3 0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#4 0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#5 0x00007f812b92f929 br_stub_readdirp (bitrot-stub.so + 0x9929)#012#6 0x00007f812b9187d2 posix_acl_readdirp (access-control.so + 0x77d2)#012#7 0x00007f812b8d76d0 pl_readdirp (locks.so + 0xd6d0)#012#8 0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#9 0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#10 0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#11 0x00007f812b8861f1 up_readdirp (upcall.so + 0xd1f1)#012#12 0x00007f8130cc61bd default_readdirp_resume (libglusterfs.so.0 + 0xd41bd)#012#13 0x00007f8130c44035 call_resume (libglusterfs.so.0 + 0x52035)#012#14 0x00007f812b86e128 iot_worker (io-threads.so + 0x7128)#012#15 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#16 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4755:#012#0 0x00007f81309ab962 __sigtimedwait (libc.so.6 + 0x3d962)#012#1 0x00007f8130b4e5bc sigwait (libpthread.so.0 + 0x145bc)#012#2 0x000055ca5c22081b glusterfs_sigwaiter (glusterfsd + 0x981b)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4759:#012#0 0x00007f8130a66fcb __select (libc.so.6 + 0xf8fcb)#012#1 0x00007f8130c95b79 runner (libglusterfs.so.0 + 0xa3b79)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4756:#012#0 0x00007f8130a36801 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc8801)#012#1 0x00007f8130a3c157 __nanosleep (libc.so.6 + 0xce157)#012#2 0x00007f8130a3c08e sleep (libc.so.6 + 0xce08e)#012#3 0x00007f8130c470e5 pool_sweeper (libglusterfs.so.0 + 0x550e5)#012#4 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#5 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4760:#012#0 0x00007f8130a6fc5e epoll_wait (libc.so.6 + 0x101c5e)#012#1 0x00007f8130c816a2 event_dispatch_epoll_worker (libglusterfs.so.0 + 0x8f6a2)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4761:#012#0 0x00007f8130a6fc5e epoll_wait (libc.so.6 + 0x101c5e)#012#1 0x00007f8130c816a2 event_dispatch_epoll_worker (libglusterfs.so.0 + 0x8f6a2)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4770:#012#0 0x00007f8130a36801 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc8801)#012#1 0x00007f8130a3c157 __nanosleep (libc.so.6 + 0xce157)#012#2 0x00007f8130a3c08e sleep (libc.so.6 + 0xce08e)#012#3 0x00007f812b9f57b0 posix_disk_space_check_thread_proc (posix.so + 0xd7b0)#012#4 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#5 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4771:#012#0 0x00007f8130a36801 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc8801)#012#1 0x00007f8130a3c157 __nanosleep (libc.so.6 + 0xce157)#012#2 0x00007f8130a3c08e sleep (libc.so.6 + 0xce08e)#012#3 0x00007f812b9f5156 posix_health_check_thread_proc (posix.so + 0xd156)#012#4 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#5 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4767:#012#0 0x00007f8130a66fcb __select (libc.so.6 + 0xf8fcb)#012#1 0x00007f812b97d402 changelog_ev_dispatch (changelog.so + 0x1c402)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4842:#012#0 0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1 0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4772:#012#0 0x00007f8130b4a1b8 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x101b8)#012#1 0x00007f812b9f0114 posix_ctx_janitor_thread_proc (posix.so + 0x8114)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4850:#012#0 0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1 0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4852:#012#0 0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1 0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4773:#012#0 0x00007f8130b49e92 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe92)#012#1 0x00007f812b9f59bb posix_fsyncer_pick (posix.so + 0xd9bb)#012#2 0x00007f812b9f5c25 posix_fsyncer (posix.so + 0xdc25)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4858:#012#0 0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1 0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4860:#012#0 0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1 0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4857:#012#0 0x00007f8130b45da3 __pthread_mutex_lock (libpthread.so.0 + 0xbda3)#012#1 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4757:#012#0 0x00007f8130b4a1b8 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x101b8)#012#1 0x00007f8130c5d3d9 syncenv_task (libglusterfs.so.0 + 0x6b3d9)#012#2 0x00007f8130c5e1a5 syncenv_processor (libglusterfs.so.0 + 0x6c1a5)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4758:#012#0 0x00007f8130b4a1b8 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x101b8)#012#1 0x00007f8130c5d3d9 syncenv_task (libglusterfs.so.0 + 0x6b3d9)#012#2 0x00007f8130c5e1a5 syncenv_processor (libglusterfs.so.0 + 0x6c1a5)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4762:#012#0 0x00007f8130b49e92 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe92)#012#1 0x00007f812b81d994 index_worker (index.so + 0x7994)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4768:#012#0 0x00007f8130a66fcb __select (libc.so.6 + 0xf8fcb)#012#1 0x00007f812b97d402 changelog_ev_dispatch (changelog.so + 0x1c402 Oct 22 18:04:34 verijolt audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4906-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Oct 22 18:04:34 verijolt audit: BPF prog-id=47 op=UNLOAD Oct 22 18:04:34 verijolt audit: BPF prog-id=46 op=UNLOAD Oct 22 18:04:34 verijolt audit: BPF prog-id=45 op=UNLOAD Oct 22 18:04:34 verijolt systemd[1]: systemd-coredump@0-4906-0.service: Succeeded. Oct 22 18:04:37 verijolt abrt-server[4960]: Deleting problem directory ccpp-2020-10-22-18:04:35.4544-4753 (dup of ccpp-2020-10-22-14:23:50.21866-346125) Oct 22 18:04:38 verijolt abrt-notification[5020]: Process 346125 (glusterfsd) crashed in posix_get_ancestry_non_directory() ```

snippet from /var/log/glusterfs/bricks/srv-brick06.log

``` [2020-10-22 17:04:34.150577] E [inodelk.c:506:__inode_unlock_lock] 0-gluvol1-locks: Matching lock not found for unlock 0-9223372036854775807, by 181b0088357f0000 on 0x7f8124011df0 [2020-10-22 17:04:34.150614] E [MSGID: 115053] [server-rpc-fops_v2.c:271:server4_inodelk_cbk] 0-gluvol1-server: 55: INODELK (73a42b0e-64d1-4606-8337-fce48490a10b), client: CTX_ID:2784eca2-176b-4f8a-b0b8-0a81d4e811b3-GRAPH_ID:0-PID:1701-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0, error-xlator: gluvol1-locks [Invalid argument] pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2020-10-22 17:04:34 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 7.8 /lib64/libglusterfs.so.0(+0x2afa4)[0x7f8130c1cfa4] /lib64/libglusterfs.so.0(gf_print_trace+0x333)[0x7f8130c27c93] /lib64/libc.so.6(+0x3ca70)[0x7f81309aaa70] /usr/lib64/glusterfs/7.8/xlator/storage/posix.so(+0x31729)[0x7f812ba19729] /usr/lib64/glusterfs/7.8/xlator/storage/posix.so(+0x31c9f)[0x7f812ba19c9f] /usr/lib64/glusterfs/7.8/xlator/storage/posix.so(+0x3ac20)[0x7f812ba22c20] /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b] /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b] /usr/lib64/glusterfs/7.8/xlator/features/bitrot-stub.so(+0x9929)[0x7f812b92f929] /usr/lib64/glusterfs/7.8/xlator/features/access-control.so(+0x77d2)[0x7f812b9187d2] /usr/lib64/glusterfs/7.8/xlator/features/locks.so(+0xd6d0)[0x7f812b8d76d0] /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b] /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b] /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b] /usr/lib64/glusterfs/7.8/xlator/features/upcall.so(+0xd1f1)[0x7f812b8861f1] /lib64/libglusterfs.so.0(default_readdirp_resume+0x21d)[0x7f8130cc61bd] /lib64/libglusterfs.so.0(call_resume+0x75)[0x7f8130c44035] /usr/lib64/glusterfs/7.8/xlator/performance/io-threads.so(+0x7128)[0x7f812b86e128] /lib64/libpthread.so.0(+0x9432)[0x7f8130b43432] /lib64/libc.so.6(clone+0x43)[0x7f8130a6f913] --------- ```
Bockeman commented 4 years ago

Urgent problem no longer an issue I repeated the volume stop/start again today, and, voila, brick06 on server verijolt came back online.

``` export start_date=`date +%F\ %T`; \ gluster volume stop gluvol1; \ echo " ${start_date}" && date +\ \ %F\ %T%n Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: gluvol1: success 2020-10-23 12:43:21 2020-10-23 12:43:44 export start_date=`date +%F\ %T`; \ gluster volume start gluvol1 \ 2>&1 | awk '{print " " $0;fflush()}'; echo " ${start_date}" && date +\ \ %F\ %T%n volume start: gluvol1: success 2020-10-23 12:44:20 2020-10-23 12:44:34 export start_date=`date +%F\ %T`; \ gluster volume status \ 2>&1 | awk '{print " " $0;fflush()}'; echo " ${start_date}" && date +\ \ %F\ %T%n Status of volume: gluvol0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick veriicon:/srv/brick04 49156 0 Y 2063406 Brick verijolt:/srv/brick04 49157 0 Y 4575 Brick veriicon:/srv/brick00 49157 0 Y 2063429 Brick verijolt:/srv/brick00 49158 0 Y 4598 Brick veriicon:/srv/brick01 49158 0 Y 2063452 Brick verijolt:/srv/brick01 49159 0 Y 4621 Self-heal Daemon on localhost N/A N/A Y 1969 Quota Daemon on localhost N/A N/A Y 53485 Bitrot Daemon on localhost N/A N/A Y 53517 Scrubber Daemon on localhost N/A N/A Y 53546 Self-heal Daemon on veriicon N/A N/A Y 2063499 Quota Daemon on veriicon N/A N/A Y 2355307 Bitrot Daemon on veriicon N/A N/A Y 2355350 Scrubber Daemon on veriicon N/A N/A Y 2355380 Task Status of Volume gluvol0 ------------------------------------------------------------------------------ Task : Rebalance ID : c683596c-82b0-41b9-9d7e-10d4208bb440 Status : completed Status of volume: gluvol1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick veriicon:/srv/brick06 49159 0 Y 2355232 Brick verijolt:/srv/brick06 49160 0 Y 53399 Brick veriicon:/srv/brick05 49160 0 Y 2355254 Brick verijolt:/srv/brick05 49163 0 Y 53421 Brick veriicon:/srv/brick07 49161 0 Y 2355276 Brick verijolt:/srv/brick07 49164 0 Y 53443 Self-heal Daemon on localhost N/A N/A Y 1969 Quota Daemon on localhost N/A N/A Y 53485 Bitrot Daemon on localhost N/A N/A Y 53517 Scrubber Daemon on localhost N/A N/A Y 53546 Self-heal Daemon on veriicon N/A N/A Y 2063499 Quota Daemon on veriicon N/A N/A Y 2355307 Bitrot Daemon on veriicon N/A N/A Y 2355350 Scrubber Daemon on veriicon N/A N/A Y 2355380 Task Status of Volume gluvol1 ------------------------------------------------------------------------------ There are no active volume tasks 2020-10-23 12:46:34 2020-10-23 12:46:34 ```

I don't understand this. My history shows that I did the stop, on this server, yesterday:

history | egrep 'stop|start'
  866  2020-10-22 14:22:08 gluster volume stop gluvol0
  867  2020-10-22 14:22:39 gluster volume stop gluvol1
  868  2020-10-22 14:23:24 gluster volume start gluvol0
  869  2020-10-22 14:23:41 gluster volume start gluvol1

So what got fixed? Why did the repeat today change things when yesterday it had no effect?

There's still something wrong Of course, now no longer so urgent, do we have any idea what caused this one brick to go offline? Is there more information which I might be able to supply?

mohit84 commented 4 years ago

As you mentioned brick06 was crashed that's why gluster cli was showing status N/A. After start the volume it is expected brick status is showing correctly by CLI. Can you please brick core with gdb and share the output "thread apply all bt full"

Bockeman commented 4 years ago

Some more info Just to see if the immediate problem had been fixed by the volume stop/start, I looked at the brick log again. The tail of /var/log/glusterfs/bricks/srv-brick06.log is shown in the details below. Notice the timestamp is after the start (as logged in details of prior comment) and that the log appears to have been truncated.

``` [2020-10-23 11:44:22.347133] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.21" [2020-10-23 11:44:22.347177] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:dce4fcdc-10f5-45de-8000-f4030618b984-GRAPH_ID:4-PID:1013-HOST:verilate-PC_NAME:gluvol1-client-5-RECON_NO:-6 (version: 6.9) with subvol /srv/brick06 [2020-10-23 11:44:22.350754] I [rpcsvc.c:864:rpcsvc_handle_rpc_call] 0-rpc-service: spawned a request handler thread for queue 0 [2020-10-23 11:44:23.384978] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.29" [2020-10-23 11:44:23.384997] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:cead90c8-cf17-4f64-864e-d5fb99e0d21f-GRAPH_ID:4-PID:1025-HOST:veritosh-PC_NAME:gluvol1-client-5-RECON_NO:-9 (version: 6.9) with subvol /srv/brick06 [2020-10-23 11:44:23.386722] I [rpcsvc.c:864:rpcsvc_handle_rpc_call] 0-rpc-service: spawned a request handler thread for queue 1 [2020-10-23 11:44:23.388107] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.27" [2020-10-23 11:44:23.388124] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:c741e6ef-5725-4f23-b4c3-9b7f94be1a01-GRAPH_ID:4-PID:1519-HOST:verirack-PC_NAME:gluvol1-client-5-RECON_NO:-7 (version: 6.9) with subvol /srv/brick06 [2020-10-23 11:44:23.401629] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.19" [2020-10-23 11:44:23.401649] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2 [2020-10-23 11:44:23.401665] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:33c6affe-ce70-4122-9658-289a4ebf0420-GRAPH_ID:0-PID:53485-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:23.666142] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.18" [2020-10-23 11:44:23.666179] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2 [2020-10-23 11:44:23.666194] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:76a4bc50-328e-45dd-9cdf-99329aec9ad2-GRAPH_ID:0-PID:1277-HOST:veriicon-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:23.669918] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.19" [2020-10-23 11:44:23.669932] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:e0928c38-c13e-4ab5-8d43-b26e397967ee-GRAPH_ID:0-PID:1303-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-3 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:23.686523] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.10" [2020-10-23 11:44:23.686538] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:fd9a184d-61a0-41ae-bfc5-83c67a7a0497-GRAPH_ID:4-PID:1044-HOST:verialto-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.7) with subvol /srv/brick06 [2020-10-23 11:44:23.704923] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.12" [2020-10-23 11:44:23.704939] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:36ddaf3e-f7e2-4224-a9b8-5b1a6de94a8e-GRAPH_ID:0-PID:1282-HOST:vericalm-PC_NAME:gluvol1-client-5-RECON_NO:-10 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:23.705368] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.17" [2020-10-23 11:44:23.705385] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:7bee7605-6be2-41b3-8fc4-1fc289bade00-GRAPH_ID:4-PID:1030-HOST:veriheat-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.7) with subvol /srv/brick06 [2020-10-23 11:44:23.777276] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.11" [2020-10-23 11:44:23.777291] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:eeda6a67-5c53-44f7-ba1a-b366adaf97fe-GRAPH_ID:4-PID:1240-HOST:veriblob-PC_NAME:gluvol1-client-5-RECON_NO:-10 (version: 7.7) with subvol /srv/brick06 [2020-10-23 11:44:25.426054] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.19" [2020-10-23 11:44:25.426068] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2 [2020-10-23 11:44:25.426076] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:21d955c2-e8fd-4a41-847a-38729647bd8e-GRAPH_ID:4-PID:1969-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:25.434554] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.19" [2020-10-23 11:44:25.434571] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2 [2020-10-23 11:44:25.434579] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:cff7c9fe-fbdc-445d-b1c8-c068493de96b-GRAPH_ID:0-PID:53517-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:27.440119] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "*", received addr = "192.168.0.19" [2020-10-23 11:44:27.440211] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2 [2020-10-23 11:44:27.440221] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:35535c81-dd54-497c-bfe1-a5784f1c0e65-GRAPH_ID:0-PID:53546-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 pending frames: frame : type(1) op(FSETXATTR) frame : type(1) op(LOOKUP) frame : type(1) op(OPEN) frame : type(1) op(READ) frame : type(0) op(0) frame : type(1) op(READ) patchset: git://git.gluster.org/glusterfs.git signal received: 11 ```
mohit84 commented 4 years ago

It seems the brick process is crashed again. Can you attached a core with gdb and share the output "thread apply all bt full" ?

Bockeman commented 4 years ago

@mohit84 thanks for picking this up.

I understand that the CLI status for brick06 on server verijolt is showing "N" under "online" when the brick has crashed.

At the moment, brick06 on verijolt is showing "Y" despite the truncated brick log showing a potential crash (signal received: 11). However, verijolt is now working flat out (100% on 4 CPUs) attempting to recover, I assume.

The tail of /var/log/glusterfs/glfsheal-gluvol1.log shows

``` [2020-10-23 12:36:16.652483] I [MSGID: 108031] [afr-common.c:2531:afr_local_discovery_cbk] 0-gluvol1-replicate-1: selecting local read_child gluvol1-client-7 [2020-10-23 12:36:16.653235] I [MSGID: 108031] [afr-common.c:2531:afr_local_discovery_cbk] 0-gluvol1-replicate-2: selecting local read_child gluvol1-client-9 [2020-10-23 12:36:16.653396] I [MSGID: 104041] [glfs-resolve.c:974:__glfs_active_subvol] 0-gluvol1: switched to graph 76657269-6a6f-6c74-2d35-363332312d32 (0) [2020-10-23 12:36:48.660675] C [rpc-clnt-ping.c:152:rpc_clnt_ping_timer_expired] 0-gluvol1-client-5: server 192.168.0.19:49160 has not responded in the last 42 seconds, disconnecting. [2020-10-23 12:36:48.660744] I [MSGID: 114018] [client.c:2341:client_rpc_notify] 0-gluvol1-client-5: disconnected from gluvol1-client-5. Client process will keep trying to connect to glusterd until brick's port is available [2020-10-23 12:36:48.661038] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2020-10-23 12:36:06.660617 (xid=0x5) [2020-10-23 12:36:48.661051] W [MSGID: 114032] [client-handshake.c:1552:client_dump_version_cbk] 0-gluvol1-client-5: received RPC status error [Transport endpoint is not connected] [2020-10-23 12:36:48.661111] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2020-10-23 12:36:06.660626 (xid=0x6) [2020-10-23 12:36:48.661120] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] 0-gluvol1-client-5: socket disconnected [2020-10-23 12:36:51.663231] I [rpc-clnt.c:1962:rpc_clnt_reconfig] 0-gluvol1-client-5: changing port to 49160 (from 0) [2020-10-23 12:37:33.667582] C [rpc-clnt-ping.c:152:rpc_clnt_ping_timer_expired] 0-gluvol1-client-5: server 192.168.0.19:49160 has not responded in the last 42 seconds, disconnecting. [2020-10-23 12:37:33.667663] I [MSGID: 114018] [client.c:2341:client_rpc_notify] 0-gluvol1-client-5: disconnected from gluvol1-client-5. Client process will keep trying to connect to glusterd until brick's port is available [2020-10-23 12:37:33.667862] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2020-10-23 12:36:51.667518 (xid=0xa) [2020-10-23 12:37:33.667891] W [MSGID: 114032] [client-handshake.c:1552:client_dump_version_cbk] 0-gluvol1-client-5: received RPC status error [Transport endpoint is not connected] [2020-10-23 12:37:33.667969] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2020-10-23 12:36:51.667528 (xid=0xb) [2020-10-23 12:37:33.667980] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] 0-gluvol1-client-5: socket disconnected [2020-10-23 12:37:36.670101] I [rpc-clnt.c:1962:rpc_clnt_reconfig] 0-gluvol1-client-5: changing port to 49160 (from 0) [2020-10-23 12:38:18.675590] C [rpc-clnt-ping.c:152:rpc_clnt_ping_timer_expired] 0-gluvol1-client-5: server 192.168.0.19:49160 has not responded in the last 42 seconds, disconnecting. [2020-10-23 12:38:18.675664] I [MSGID: 114018] [client.c:2341:client_rpc_notify] 0-gluvol1-client-5: disconnected from gluvol1-client-5. Client process will keep trying to connect to glusterd until brick's port is available [2020-10-23 12:38:18.675764] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2020-10-23 12:37:36.675526 (xid=0xf) [2020-10-23 12:38:18.675775] W [MSGID: 114032] [client-handshake.c:1552:client_dump_version_cbk] 0-gluvol1-client-5: received RPC status error [Transport endpoint is not connected] [2020-10-23 12:38:18.675833] E [rpc-clnt.c:338:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f56d83ee239] (--> /lib64/libgfrpc.so.0(+0xe1f4)[0x7f56d7b561f4] (--> /lib64/libgfrpc.so.0(+0xe322)[0x7f56d7b56322] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc6)[0x7f56d7b572c6] (--> /lib64/libgfrpc.so.0(+0xfed8)[0x7f56d7b57ed8] ))))) 0-gluvol1-client-5: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2020-10-23 12:37:36.675538 (xid=0x10) [2020-10-23 12:38:18.675843] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] 0-gluvol1-client-5: socket disconnected [2020-10-23 12:38:21.677950] I [rpc-clnt.c:1962:rpc_clnt_reconfig] 0-gluvol1-client-5: changing port to 49160 (from 0) ```
mohit84 commented 4 years ago

Is CLI showing some different pid other than brick_pid that was crashed? Can you check the pid status in ps -aef | grep Please share the dump of /var/log/glusterfs from all the nodes.

Bockeman commented 4 years ago

@mohit84 Please could you give me a bit more help in running debugger etc., it's been a while since I last did that.

Can you please brick core with gdb and share the output "thread apply all bt full"

could you tell me the commands to use.

Bockeman commented 4 years ago
ps -aef | grep 53399
root       53399       1 99 12:44 ?        04:53:46 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick06 -p /var/run/gluster/vols/gluvol1/verijolt-srv-brick06.pid -S /var/run/gluster/e81ffbefdc82
mohit84 commented 4 years ago

It means brick is running , the pid(53399) is same showing by CLI, right.

Before attach a core with gdb please install debug package of glusterfs. 1) gdb /usr/sbin/glusterfsd -c 2) Run thread apply all bt full and share the output.

Bockeman commented 4 years ago

Please share the dump of /var/log/glusterfs from all the nodes.

I'm not sure what you mean by dump and all the nodes. This directory is large (~11GB) as this du shows:

export start_date=`date +%F\ %T`; \
du -xsmc /var/log/glusterfs/* \
2>&1 | sort -n | awk '{printf("  %8d\t%s\n",$1,substr($0,index($0,$2)))}'; \
echo "  ${start_date}" && date +\ \ %F\ %T
         1      /var/log/glusterfs/cli.log
         1      /var/log/glusterfs/cmd_history.log
         1      /var/log/glusterfs/gfproxy
         1      /var/log/glusterfs/gluvol0-rebalance.log
         1      /var/log/glusterfs/gluvol0-replace-brick-mount.log
         1      /var/log/glusterfs/gluvol1-replace-brick-mount.log
         1      /var/log/glusterfs/quota_crawl
         1      /var/log/glusterfs/snaps
         2      /var/log/glusterfs/glfsheal-gluvol0.log
         2      /var/log/glusterfs/glfsheal-gluvol1.log
         2      /var/log/glusterfs/srv-gluvol0.log
         3      /var/log/glusterfs/glusterd.log
         4      /var/log/glusterfs/scrub.log
         7      /var/log/glusterfs/quotad.log
        36      /var/log/glusterfs/gluvol1-rebalance.log_201004_0005.gz
        64      /var/log/glusterfs/srv-gluvol1.log
       134      /var/log/glusterfs/glustershd.log
       303      /var/log/glusterfs/gluvol1-rebalance.log
       409      /var/log/glusterfs/bitd.log_201004_0005.gz
       473      /var/log/glusterfs/bricks
      9762      /var/log/glusterfs/bitd.log
     11195      total
  2020-10-23 14:09:43
  2020-10-23 14:09:43
mohit84 commented 4 years ago

for now you can share glusterd.log and brick logs from all the nodes.

Bockeman commented 4 years ago

It means brick is running , the pid(53399) is same showing by CLI, right.

Agreed, it looks as though the brick process is running, but I don't think this brick is in a working state. It should be being healed, but the pending count is not going down.

  # Healing
  server        brick:  00  01  02  03  04  05  06  07
  veriicon    pending:   0   0   0   0   0   0  101653   0
  verijolt    pending:   0   0   0   0   0   0   0   0
  veriicon      split:   0   0   0   0   0   0   0   0
  verijolt      split:   0   0   0   0   0   0   0   0
  veriicon    healing:   0   0   0   0   0   0   0   0
  verijolt    healing:   0   0   0   0   0   0   0   0
  2020-10-23 14:26:07

Script used to collate healing status

``` echo " # Healing"; \ gluster volume heal gluvol0 info summary > gluster_heal_summary.out; \ gluster volume heal gluvol1 info summary >> gluster_heal_summary.out; \ awk '/Brick/{brick=substr($2,length($2)-1)+0; server=substr($2,1,index($2,":")-1); \ if((server in server_a)==0){server_a[server]=++servers;server_n[servers]=server}} \ /pending/{if(brick>bricks)bricks=brick; pending[server_a[server]"_"brick]=$NF} \ /split/{if(brick>bricks)bricks=brick; splitbrain[server_a[server]"_"brick]=$NF} \ /healing/{if(brick>bricks)bricks=brick; healing[server_a[server]"_"brick]=$NF} \ END{printf(" %-10s %8s:","server","brick");for(i=0;i<=bricks;i++)printf(" %02d",i);printf("\n"); \ for(j=1;j<=servers;j++){printf(" %-10s %8s:",server_n[j],"pending");for(i=0;i<=bricks;i++) \ printf(" %2d",pending[server_a[server_n[j]]"_"i]);printf("\n");} \ for(j=1;j<=servers;j++){printf(" %-10s %8s:",server_n[j],"split");for(i=0;i<=bricks;i++) \ printf(" %2d",splitbrain[server_a[server_n[j]]"_"i]);printf("\n");} \ for(j=1;j<=servers;j++){printf(" %-10s %8s:",server_n[j],"healing");for(i=0;i<=bricks;i++) \ printf(" %2d",healing[server_a[server_n[j]]"_"i]);printf("\n");} \ }' gluster_heal_summary.out; date +\ \ %F\ %T; \ echo "" ```
Bockeman commented 4 years ago

Before attach a core with gdb please install debug package of glusterfs.

gdb /usr/sbin/glusterfsd -c <brick_core>
Run thread apply all bt full and share the output.

On fedora, I installed -devel versions, assuming that is what you meant be debug package dnf install -y glusterfs-api-devel glusterfs-devel

where is the <brick_core> located?

mohit84 commented 4 years ago

The core is saved the location configured at /proc/sys/kernel/core_pattern, you need to ti install glusterfs-debuginfo package.

Bockeman commented 4 years ago

glusterd.log

Bockeman commented 4 years ago

The core is saved the location configured at /proc/sys/kernel/core_pattern, you need to ti install glusterfs-debuginfo package.

ls -l /proc/sys/kernel/core_pattern
-rw-r--r-- 1 root root 0 2020-10-23 14:50 /proc/sys/kernel/core_pattern
dnf info glusterfs-debuginfo 
Last metadata expiration check: 0:37:38 ago on Fri 23 Oct 2020 14:14:36 BST.
Error: No matching Packages to list
mohit84 commented 4 years ago

cat /proc/sys/kernel/core_pattern

Bockeman commented 4 years ago

The full logs exceed the 10MB limit, so I pruned them a bit.

srv-brick05_201023.log srv-brick06_201023.log srv-brick07_201023.log

Bockeman commented 4 years ago

cat /proc/sys/kernel/core_pattern

cat /proc/sys/kernel/core_pattern 
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
ls -l /usr/lib/systemd/systemd-coredump
-rwxr-xr-x 1 root root 61776 2020-09-21 08:47 /usr/lib/systemd/systemd-coredump

The date on this file does not correspond to yesterdays brick crash.

mohit84 commented 4 years ago

Below are the latest brick logs for brick06, i am not seeing any issue in brick logs.

[2020-10-23 11:44:20.574310] I [MSGID: 100030] [glusterfsd.c:2865:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 7.8 (args: /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick06 -p /var/run/gluster/vols/gluvol1/verijolt-srv-brick06.pid -S /var/run/gluster/e81ffbefdc824bb9.socket --brick-name /srv/brick06 -l /var/log/glusterfs/bricks/srv-brick06.log --xlator-option -posix.glusterd-uuid=04eb8fdd-ebb8-44c9-9064-5578f43e55b8 --process-name brick --brick-port 49160 --xlator-option gluvol1-server.listen-port=49160) [2020-10-23 11:44:20.598225] I [glusterfsd.c:2593:daemonize] 0-glusterfs: Pid of current running process is 53399 [2020-10-23 11:44:20.602226] I [socket.c:957:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9 [2020-10-23 11:44:20.604864] I [MSGID: 101190] [event-epoll.c:679:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 [2020-10-23 11:44:20.604898] I [MSGID: 101190] [event-epoll.c:679:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2020-10-23 11:44:20.721821] I [rpcsvc.c:2689:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64 [2020-10-23 11:44:20.722754] W [socket.c:4161:reconfigure] 0-gluvol1-quota: disabling non-blocking IO [2020-10-23 11:44:20.762663] I [socket.c:957:__socket_server_bind] 0-socket.gluvol1-changelog: closing (AF_UNIX) reuse check socket 15 [2020-10-23 11:44:20.763042] I [trash.c:2449:init] 0-gluvol1-trash: no option specified for 'eliminate', using NULL [2020-10-23 11:44:22.347133] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.21" [2020-10-23 11:44:22.347177] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:dce4fcdc-10f5-45de-8000-f4030618b984-GRAPH_ID:4-PID:1013-HOST:verilate-PC_NAME:gluvol1-client-5-RECON_NO:-6 (version: 6.9) with subvol /srv/brick06 [2020-10-23 11:44:22.350754] I [rpcsvc.c:864:rpcsvc_handle_rpc_call] 0-rpc-service: spawned a request handler thread for queue 0 [2020-10-23 11:44:23.384978] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.29" [2020-10-23 11:44:23.384997] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:cead90c8-cf17-4f64-864e-d5fb99e0d21f-GRAPH_ID:4-PID:1025-HOST:veritosh-PC_NAME:gluvol1-client-5-RECON_NO:-9 (version: 6.9) with subvol /srv/brick06 [2020-10-23 11:44:23.386722] I [rpcsvc.c:864:rpcsvc_handle_rpc_call] 0-rpc-service: spawned a request handler thread for queue 1 [2020-10-23 11:44:23.388107] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.27" [2020-10-23 11:44:23.388124] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:c741e6ef-5725-4f23-b4c3-9b7f94be1a01-GRAPH_ID:4-PID:1519-HOST:verirack-PC_NAME:gluvol1-client-5-RECON_NO:-7 (version: 6.9) with subvol /srv/brick06 [2020-10-23 11:44:23.401629] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.19" [2020-10-23 11:44:23.401649] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2 [2020-10-23 11:44:23.401665] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:33c6affe-ce70-4122-9658-289a4ebf0420-GRAPH_ID:0-PID:53485-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:23.666142] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.18" [2020-10-23 11:44:23.666179] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2 [2020-10-23 11:44:23.666194] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:76a4bc50-328e-45dd-9cdf-99329aec9ad2-GRAPH_ID:0-PID:1277-HOST:veriicon-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:23.669918] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.19" [2020-10-23 11:44:23.669932] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:e0928c38-c13e-4ab5-8d43-b26e397967ee-GRAPH_ID:0-PID:1303-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-3 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:23.686523] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.10" [2020-10-23 11:44:23.686538] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:fd9a184d-61a0-41ae-bfc5-83c67a7a0497-GRAPH_ID:4-PID:1044-HOST:verialto-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.7) with subvol /srv/brick06 [2020-10-23 11:44:23.704923] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.12" [2020-10-23 11:44:23.704939] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:36ddaf3e-f7e2-4224-a9b8-5b1a6de94a8e-GRAPH_ID:0-PID:1282-HOST:vericalm-PC_NAME:gluvol1-client-5-RECON_NO:-10 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:23.705368] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.17" [2020-10-23 11:44:23.705385] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:7bee7605-6be2-41b3-8fc4-1fc289bade00-GRAPH_ID:4-PID:1030-HOST:veriheat-PC_NAME:gluvol1-client-5-RECON_NO:-12 (version: 7.7) with subvol /srv/brick06 [2020-10-23 11:44:23.777276] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.11" [2020-10-23 11:44:23.777291] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:eeda6a67-5c53-44f7-ba1a-b366adaf97fe-GRAPH_ID:4-PID:1240-HOST:veriblob-PC_NAME:gluvol1-client-5-RECON_NO:-10 (version: 7.7) with subvol /srv/brick06 [2020-10-23 11:44:25.426054] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.19" [2020-10-23 11:44:25.426068] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2 [2020-10-23 11:44:25.426076] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:21d955c2-e8fd-4a41-847a-38729647bd8e-GRAPH_ID:4-PID:1969-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:25.434554] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.19" [2020-10-23 11:44:25.434571] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2 [2020-10-23 11:44:25.434579] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:cff7c9fe-fbdc-445d-b1c8-c068493de96b-GRAPH_ID:0-PID:53517-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 [2020-10-23 11:44:27.440119] I [addr.c:52:compare_addr_and_update] 0-/srv/brick06: allowed = "", received addr = "192.168.0.19" [2020-10-23 11:44:27.440211] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2 [2020-10-23 11:44:27.440221] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:35535c81-dd54-497c-bfe1-a5784f1c0e65-GRAPH_ID:0-PID:53546-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06

For specific to heal we need to check glustershd.log and heal logs glfsheal-gluvol1.log

Bockeman commented 4 years ago

@mohit84 thanks for continuing to try and help me out here. There are two things:

  1. understanding what has happened and getting information to identify the bug
  2. get my gluster system back to a working state (it is holding up at the moment using brick06 (10TB) on veriicon, but I have no resilience, I do not have a full backup for all this data -- I had been relying on gluster to avoid that).

Is there anything you can suggest that I might try for the latter?

Bockeman commented 4 years ago

Below are the latest brick logs for brick06, i am not seeing any issue in brick logs.

My fault. These log files are too big for 10MB limit for pasting into this ticket. I stripped by date, not realising the crash dumps are not prefixed by date

  [2020-10-23 11:44:27.440211] I [login.c:109:gf_auth] 0-auth/login: allowed user names: 6599e942-63ee-451c-8c31-766bb05ac0c2
  [2020-10-23 11:44:27.440221] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gluvol1-server: accepted client from CTX_ID:35535c81-dd54-497c-bfe1-a5784f1c0e65-GRAPH_ID:0-PID:53546-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0 (version: 7.8) with subvol /srv/brick06 
  pending frames:
  frame : type(1) op(FSETXATTR)
  frame : type(1) op(LOOKUP)
  frame : type(1) op(OPEN)
  frame : type(1) op(READ)
  frame : type(0) op(0)
  frame : type(1) op(READ)
  patchset: git://git.gluster.org/glusterfs.git
  signal received: 11

I'll see if I can attach a more complete tail of that log file.

Bockeman commented 4 years ago

srv-brick06_201023.log

Can you explain why brick06 appears to be online, yet this log shows it has crashed? Any suggestions for moving this to a working state?

Bockeman commented 4 years ago

For specific to heal we need to check glustershd.log and heal logs glfsheal-gluvol1.log

glfsheal-gluvol1.log glustershd_201023.log

xhernandez commented 4 years ago
dnf info glusterfs-debuginfo 
Last metadata expiration check: 0:37:38 ago on Fri 23 Oct 2020 14:14:36 BST.
Error: No matching Packages to list

You need to run this to install the debug information (assuming you are running latest version):

dnf debuginfo-install glusterfs-server-7.8-1.fc32.x86_64
Bockeman commented 4 years ago

Before attach a core with gdb please install debug package of glusterfs.

  1. gdb /usr/sbin/glusterfsd -c
  2. Run thread apply all bt full and share the output.

I had to do a load of installs:

dnf debuginfo-install glusterfs-server
dnf install -y gdb
dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/23/63b517080071fe0d9871c7c1534df99fd7f970.debug
dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/39/2de5e09ed27bf2fe1722c0198295777db75ef5.debug
dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/63/debfea3b4768cdcfb082e38cd754688642b1ec.debug
dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/99/1df1f4a01379a1fd494b9a8fc104c0f02e2a5e.debug
dnf debuginfo-install glibc-2.31-4.fc32.x86_64 keyutils-libs-1.6-4.fc32.x86_64 krb5-libs-1.18.2-22.fc32.x86_64 libacl-2.2.53-5.fc32.x86_64 libaio-0.3.111-7.fc32.x86_64 libattr-2.4.48-8.fc32.x86_64 libcom_err-1.45.5-3.fc32.x86_64 libgcc-10.2.1-5.fc32.x86_64 libselinux-3.0-5.fc32.x86_64 libtirpc-1.2.6-1.rc4.fc32.x86_64 libuuid-2.35.2-1.fc32.x86_64 openssl-libs-1.1.1g-1.fc32.x86_64 pcre2-10.35-7.fc32.x86_64 sssd-client-2.4.0-1.fc32.x86_64 zlib-1.2.11-21.fc32.x86_64

I think a found the core dump

``` ls -l /var/spool/abrt/ccpp-2020-10-22-14:23:50.21866-346125 \ 2>&1 | awk '{print " " $0}'; date +\ \ %F\ %T%n total 196960 -rw-r----- 1 root abrt 6 2020-10-22 14:23 abrt_version -rw-r----- 1 root abrt 17 2020-10-22 14:23 analyzer -rw-r----- 1 root abrt 6 2020-10-22 14:23 architecture -rw-r----- 1 root abrt 34 2020-10-22 14:23 cgroup -rw-r----- 1 root abrt 411 2020-10-22 14:23 cmdline -rw-r----- 1 root abrt 9 2020-10-22 14:23 component -rw-r----- 1 root abrt 40928 2020-10-22 14:23 core_backtrace -rw-r----- 1 root abrt 201433088 2020-10-22 14:23 coredump -rw-r----- 1 root abrt 2 2020-10-23 01:29 count -rw-r----- 1 root abrt 2216 2020-10-22 14:23 cpuinfo -rw-r----- 1 root abrt 32 2020-10-22 14:23 crash_function -rw-r----- 1 root abrt 6348 2020-10-22 14:23 dso_list -rw-r----- 1 root abrt 281 2020-10-22 14:23 environ -rw-r----- 1 root abrt 20 2020-10-22 14:23 executable -rw-r----- 1 root abrt 82 2020-10-22 14:23 exploitable -rw-r----- 1 root abrt 8 2020-10-22 14:23 hostname -rw-r----- 1 root abrt 126 2020-10-22 14:23 journald_cursor -rw-r----- 1 root abrt 22 2020-10-22 14:23 kernel -rw-r----- 1 root abrt 10 2020-10-23 01:29 last_occurrence -rw-r----- 1 root abrt 1323 2020-10-22 14:23 limits -rw-r----- 1 root abrt 35739 2020-10-22 14:23 maps -rw-r----- 1 root abrt 3809 2020-10-22 14:23 mountinfo -rw-r----- 1 root abrt 4003 2020-10-22 14:23 open_fds -rw-r----- 1 root abrt 691 2020-10-22 14:23 os_info -rw-r----- 1 root abrt 30 2020-10-22 14:23 os_release -rw-r----- 1 root abrt 25 2020-10-22 14:23 package -rw-r----- 1 root abrt 6 2020-10-22 14:23 pid -rw-r----- 1 root abrt 6 2020-10-22 14:23 pkg_arch -rw-r----- 1 root abrt 1 2020-10-22 14:23 pkg_epoch -rw-r----- 1 root abrt 19 2020-10-22 14:23 pkg_fingerprint -rw-r----- 1 root abrt 14 2020-10-22 14:23 pkg_name -rw-r----- 1 root abrt 6 2020-10-22 14:23 pkg_release -rw-r----- 1 root abrt 14 2020-10-22 14:23 pkg_vendor -rw-r----- 1 root abrt 3 2020-10-22 14:23 pkg_version -rw-r----- 1 root abrt 1326 2020-10-22 14:23 proc_pid_status -rw-r----- 1 root abrt 1 2020-10-22 14:23 pwd -rw-r----- 1 root abrt 28 2020-10-22 14:23 reason -rw-r----- 1 root abrt 1 2020-10-22 14:23 rootdir -rw-r----- 1 root abrt 4 2020-10-22 14:23 runlevel -rw-r----- 1 root abrt 10 2020-10-22 14:23 time -rw-r----- 1 root abrt 4 2020-10-22 14:23 type -rw-r----- 1 root abrt 1 2020-10-22 14:23 uid -rw-r----- 1 root abrt 5 2020-10-22 14:23 username -rw-r----- 1 root abrt 40 2020-10-22 14:23 uuid 2020-10-23 16:29:37 ```

I then ran

gdb /usr/sbin/glusterfsd -c /var/spool/abrt/ccpp-2020-10-22-14:23:50.21866-346125/coredump \
-ex "set logging file gdb_glusterfsd_gluvol1_verijolt_srv-brick06.log" -ex "set logging on"

giving the following on screen output

``` GNU gdb (GDB) Fedora 9.1-6.fc32 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/glusterfsd... Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd-7.8-1.fc32.x86_64.debug... [New LWP 346228] [New LWP 346125] [New LWP 346129] [New LWP 346128] [New LWP 346132] [New LWP 346138] [New LWP 346141] [New LWP 346139] [New LWP 346144] [New LWP 346227] [New LWP 346287] [New LWP 346134] [New LWP 346135] [New LWP 346127] [New LWP 346131] [New LWP 346137] [New LWP 346140] [New LWP 346142] [New LWP 346143] [New LWP 346145] [New LWP 346133] [New LWP 346136] [New LWP 346126] [New LWP 346130] warning: .dynamic section for "/lib64/libnss_sss.so.2" is not at the expected address (wrong library or version mismatch?) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick06 -p /'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f8814458729 in posix_get_ancestry_non_directory (this=this@entry=0x7f8804008930, leaf_inode=, head=head@entry=0x7f880c682b50, path=path@entry=0x0, type=type@entry=2, op_errno=op_errno@entry=0x7f880c682b4c, xdata=) at posix-inode-fd-ops.c:3218 3218 list = alloca(size); [Current thread is 1 (Thread 0x7f880c684700 (LWP 346228))] Copying output to gdb_glusterfsd_gluvol1_verijolt_srv-brick06.log. Copying debug output to gdb_glusterfsd_gluvol1_verijolt_srv-brick06.log. (gdb) thread apply all bt full --Type for more, q to quit, c to continue without paging--c ```

where I entered thread apply all bt full

giving gdb_glusterfsd_gluvol1_verijolt_srv-brick06.log

Bockeman commented 4 years ago

Thanks @mohit84 and @xhernandez I hope someone can make sense of the coredump info from gdb.

Meanwhile, does anyone have any suggestions for nudging my brick06 back into life? (The HDD itself looks fine, no SMART errors, and disk usage looks comparable with the working brick06 on veriicon:

``` export start_date=`date +%F\ %T`; \ du -smc /srv/brick06/* \ 2>&1 | sort -n | awk '{printf(" %8d\t%s\n",$1,substr($0,index($0,$2)))}'; echo " ${start_date}" && date +\ \ %F\ %T 0 /srv/brick06/snap 1 /srv/brick06/ftp 1 /srv/brick06/mounted.txt 3587 /srv/brick06/root 11851 /srv/brick06/var 8091137 /srv/brick06/vault 8106574 total 2020-08-23 15:36:31 2020-08-23 16:28:17 on veriicon 0 /srv/brick06/snap 1 /srv/brick06/ftp 1 /srv/brick06/mounted.txt 3587 /srv/brick06/root 11851 /srv/brick06/var 8091131 /srv/brick06/vault 8106568 total 2020-08-23 15:38:00 2020-08-23 16:21:56 ```

)

mohit84 commented 4 years ago

Hi, Thanks for sharing the coredump. It seems the file is having huge xattr on backend that's why size is showing 263507(257k) and the function calls in iot_worker thread, iot_worker thread stack size is 256k that's why a brick process is crashed.

Any idea why you did create so many xattrs on the backend ?

Thread 1 (Thread 0x7f880c684700 (LWP 346228)):
#0  0x00007f8814458729 in posix_get_ancestry_non_directory (this=this@entry=0x7f8804008930, leaf_inode=<optimized out>, head=head@entry=0x7f880c682b50, path=path@entry=0x0, type=type@entry=2, op_errno=op_errno@entry=0x7f880c682b4c, xdata=<optimized out>) at posix-inode-fd-ops.c:3218
        remaining_size = 0
        op_ret = <optimized out>
        pathlen = -1
        handle_size = 0
        pgfid = '\000' <repeats 15 times>
        nlink_samepgfid = 0
        stbuf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 0, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0}, __glibc_reserved = {0, 0, 0}}
        list = 0x0
        list_offset = 0
        priv = <optimized out>
        size = 263507
        parent = 0x0
        loc = 0x7f87f407ed20
        leaf_path = <optimized out>
        key = '\000' <repeats 4095 times>
        dirpath = '\000' <repeats 4095 times>
        pgfidstr = '\000' <repeats 36 times>
        len = <optimized out>
        __FUNCTION__ = "posix_get_ancestry_non_directory"
#1  0x00007f8814458c9f in posix_get_ancestry (this=this@entry=0x7f8804008930, leaf_inode=<optimized out>, head=head@entry=0x7f880c682b50, path=path@entry=0x0, type=type@entry=2, op_errno=op_errno@entry=0x7f880c682b4c, xdata=0x7f87f4082778) at posix-inode-fd-ops.c:3316
        ret = -1
        priv = <optimized out>
#2  0x00007f8814461c20 in posix_readdirp (frame=0x7f87f407d0f8, this=0x7f8804008930, fd=0x7f87f4081ca8, size=140222300433232, off=0, dict=0x7f87f4082778) at posix-inode-fd-ops.c:5630
        entries = {{list = {next = 0x7f880c682b50, prev = 0x7f880c682b50}, {next = 0x7f880c682b50, prev = 0x7f880c682b50}}, d_ino = 2, d_off = 0, d_len = 3623899760, d_type = 32647, d_stat = {ia_flags = 140222300433296, ia_ino = 140222159465296, ia_dev = 140222518552358, ia_rdev = 206158430232, ia_size = 10846431729114805760, ia_nlink = 208153600, ia_uid = 32648, ia_gid = 3597268480, ia_blksize = 2525381680, ia_blocks = 140221891468232, ia_atime = 0, ia_mtime = 140221891408544, ia_ctime = 140222518550629, ia_btime = 25, ia_atime_nsec = 426807549, ia_mtime_nsec = 32648, ia_ctime_nsec = 4093651256, ia_btime_nsec = 32647, ia_attributes = 10846431729114805760, ia_attributes_mask = 140221890963032, ia_gfid = "8)\000\364\207\177\000\000\001\000\000\000\000\000\000", ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}}, dict = 0x7f8804008930, inode = 0x7f87f407d0f8, d_name = 0x7f880c682c20 "\250\034\b\364\207\177"}
        op_ret = -1
        op_errno = 0
        entry = 0x0
        __FUNCTION__ = "posix_readdirp"
#3  0x00007f88196ec90b in default_readdirp (frame=0x7f87f407d0f8, this=<optimized out>, fd=0x7f87f4081ca8, size=0, off=0, xdata=0x7f87f4082778) at defaults.c:2966
        old_THIS = 0x7f880400c350
        next_xl = 0x7f8804008930
        next_xl_fn = <optimized out>
        __FUNCTION__ = "default_readdirp"
#4  0x00007f88196ec90b in default_readdirp (frame=frame@entry=0x7f87f407d0f8, this=<optimized out>, fd=fd@entry=0x7f87f4081ca8, size=size@entry=0, off=off@entry=0, xdata=xdata@entry=0x7f87f4082778) at defaults.c:2966
        old_THIS = 0x7f880400e3f0
        next_xl = 0x7f880400c350
        next_xl_fn = <optimized out>
        __FUNCTION__ = "default_readdirp"
#5  0x00007f881436e929 in br_stub_readdirp (frame=frame@entry=0x7f87f407dbc8, this=0x7f8804010c30, fd=fd@entry=0x7f87f4081ca8, size=size@entry=0, offset=offset@entry=0, dict=dict@entry=0x7f87f4082778) at bit-rot-stub.c:2898
        _new = 0x7f87f407d0f8
        old_THIS = 0x7f8804010c30
        next_xl_fn = 0x7f88196ec830 <default_readdirp>
        tmp_cbk = 0x7f88143730d0 <br_stub_readdirp_cbk>
        ret = <optimized out>
        op_errno = <optimized out>
        xref = <optimized out>
        priv = <optimized out>
        __FUNCTION__ = "br_stub_readdirp"
#6  0x00007f88143577d2 in posix_acl_readdirp (frame=frame@entry=0x7f87f407e698, this=0x7f8804012b50, fd=fd@entry=0x7f87f4081ca8, size=size@entry=0, offset=offset@entry=0, dict=dict@entry=0x7f87f4082778) at posix-acl.c:1648
        _new = 0x7f87f407dbc8
        old_THIS = 0x7f8804012b50
        next_xl_fn = 0x7f881436e530 <br_stub_readdirp>
        tmp_cbk = 0x7f881435af30 <posix_acl_readdirp_cbk>
        ret = <optimized out>
        alloc_dict = <optimized out>
        __FUNCTION__ = "posix_acl_readdirp"
#7  0x00007f88143166d0 in pl_readdirp (frame=0x7f87f407f168, this=0x7f8804014790, fd=0x7f87f4081ca8, size=0, offset=0, xdata=0x7f87f4082778) at posix.c:3046
        _new = 0x7f87f407e698
        old_THIS = <optimized out>
        next_xl_fn = 0x7f8814357560 <posix_acl_readdirp>
        tmp_cbk = 0x7f881431ef70 <pl_readdirp_cbk>
        __FUNCTION__ = "pl_readdirp"
#8  0x00007f88196ec90b in default_readdirp (frame=0x7f87f407f168, this=<optimized out>, fd=0x7f87f4081ca8, size=0, off=0, xdata=0x7f87f4082778) at defaults.c:2966
        old_THIS = 0x7f88040163b0
        next_xl = 0x7f8804014790
        next_xl_fn = <optimized out>
        __FUNCTION__ = "default_readdirp"
#9  0x00007f88196ec90b in default_readdirp (frame=0x7f87f407f168, this=<optimized out>, fd=0x7f87f4081ca8, size=0, off=0, xdata=0x7f87f4082778) at defaults.c:2966
        old_THIS = 0x7f8804018490
        next_xl = 0x7f88040163b0
        next_xl_fn = <optimized out>
        __FUNCTION__ = "default_readdirp"
#10 0x00007f88196ec90b in default_readdirp (frame=frame@entry=0x7f87f407f168, this=<optimized out>, fd=fd@entry=0x7f87f4081ca8, size=size@entry=0, off=off@entry=0, xdata=xdata@entry=0x7f87f4082778) at defaults.c:2966
        old_THIS = 0x7f880401a130
        next_xl = 0x7f8804018490
        next_xl_fn = <optimized out>
        __FUNCTION__ = "default_readdirp"
#11 0x00007f88142c51f1 in up_readdirp (frame=frame@entry=0x7f87f4089e68, this=0x7f880401be00, fd=fd@entry=0x7f87f4081ca8, size=size@entry=0, off=off@entry=0, dict=dict@entry=0x7f87f4082778) at upcall.c:1324
        _new = 0x7f87f407f168
        old_THIS = 0x7f880401be00
        next_xl_fn = 0x7f88196ec830 <default_readdirp>
        tmp_cbk = 0x7f88142bc820 <up_readdirp_cbk>
        local = <optimized out>
        __FUNCTION__ = "up_readdirp"
#12 0x00007f88197051bd in default_readdirp_resume (frame=0x7f87f407fc38, this=0x7f880401da80, fd=0x7f87f4081ca8, size=0, off=0, xdata=0x7f87f4082778) at defaults.c:2169
        _new = 0x7f87f4089e68
        old_THIS = <optimized out>
        next_xl_fn = 0x7f88142c4fc0 <up_readdirp>
        tmp_cbk = 0x7f88196e6760 <default_readdirp_cbk>
        __FUNCTION__ = "default_readdirp_resume"
#13 0x00007f8819683035 in call_resume (stub=0x7f87f409cea8) at call-stub.c:2555
        old_THIS = 0x7f880401da80
        __FUNCTION__ = "call_resume"
#14 0x00007f88142ad128 in iot_worker (data=0x7f8804056060) at io-threads.c:232
        conf = 0x7f8804056060
        this = 0x7f880401da80
        stub = 0x7f87f409cea8
        sleep_till = {tv_sec = 1603373148, tv_nsec = 770933280}
        ret = <optimized out>
        pri = 0
        bye = <optimized out>
        __FUNCTION__ = "iot_worker"
#15 0x00007f8819582432 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140222300440320, 3244609149766524837, 140222428397518, 140222428397519, 140222159741384, 140222300440320, -3308120404764074075, -3308148579319488603}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#16 0x00007f88194ae913 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
Bockeman commented 4 years ago

Any idea why you did create so many xattrs on the backend ?

A 12TB RAID5 box died (either/both controller and power supply), but the 5 HDD's are ok. I am painstakingly restoring the data from the HDDs onto a gluster volume. I am confident that I am getting this right because of good parity across the HDDs and consistent checksums on a file by file basis. The data on this box was an rsnapshot backup so it contains a lot of hard links. I have got to the stage where the data files, including all the hard links (millions of them) are restored ok, but the permissions, ownership and timestamps of the dirs and files restored onto gluster are incorrect. Hence I have scripted changing these.

It is conceivable that these scripts, full of chmod,chown and touch for each file in turn, place a burden on gluster. I stated in first (submission) comment at the top that this was a possible cause.

If running such a script does "create many xattrs on the backend" then this is a likely cause. _(I thought xattrs were to do with the gluster management, not the normal file attributes for permissions, ownership and timestamps. And by backend do you mean the gluster server?)_

Why has only one brick crashed? Why was it fine for 5 hours or so?

If this is the cause, then once my gluster volume is back to normal (brick06 on verijolt properly online), then I can break up my restore into more manageable chunks. This is a one-off exercise, I will not and do not want to be doing this again!

Given you have a clue as to the cause, how would you suggest I bring brick06 on verijolt back to life?

xhernandez commented 4 years ago

The fastest way I see to fix this is to identify the file that has so many extended attributes and remove/clear them.

To do that, inside gdb, can you execute this:

x/16bx loc->gfid

This will return 16 hexadecimal numbers, like this:

0x7f24df07b034: 0xXX    0xYY    0x--    0x--    0x--    0x--    0x--    0x--
0x7f24df07b03c: 0x--    0x--    0x--    0x--    0x--    0x--    0x--    0x--

You need to take the first two values and go to this directory in server verijolt:

/srv/brick06/.glusterfs/XX/YY

You should find a file there that has all the 16 numbers returned by gdb (with some '-' in the middle).

Once you identify the file, you need to execute this command:

# getfattr -m. -e hex -d <file>

Depending on what this returns, we'll decide how to proceed.

Bockeman commented 4 years ago

@xhernandez, very clear instructions, thank you.

x/16bx loc->gfid
  0x7f87f407ed40: 0xeb    0xc3    0xf6    0x06    0xbc    0x70    0x4e    0x67
  0x7f87f407ed48: 0xb3    0xc0    0xcd    0x5e    0x95    0x01    0xae    0x2c

ls -l /srv/brick06/.glusterfs/eb/c3/ebc3f606*
  -rwxr-xr-x 70 bobw root 43571 2019-08-02 09:15 /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c

getfattr -m . -e hex -d /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c
  /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c: Argument list too long

ls -l /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c
  -rwxr-xr-x 70 bobw root 43571 2019-08-02 09:15 /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c

getfattr '/srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c'
  /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c: Argument list too long

file /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c
  /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c: Lua bytecode, version 5.1

My interpretation of this response "Argument list too long", is not that the argument list passed to getfattr is too long, but rather that the number of attributes, or something similar, is too long for getfattr to understand.

The response from a random other file in this directory appears fine

getfattr -m . -e hex -d /srv/brick06/.glusterfs/eb/c3/ebc31332-4198-488e-baca-b59b291fbdd7
  getfattr: Removing leading '/' from absolute path names
  # file: srv/brick06/.glusterfs/eb/c3/ebc31332-4198-488e-baca-b59b291fbdd7
  trusted.afr.dirty=0x000000000000000000000000
  trusted.bit-rot.version=0x12000000000000005f8c1ab4000ebf2a
  trusted.gfid=0xebc313324198488ebacab59b291fbdd7
  trusted.gfid2path.caf84fba69486ea6=0x38626639666662342d613637312d343039612d623238652d6439386261343962656338352f77646d6175642e737973
  trusted.glusterfs.mdata=0x010000000000000000000000005e6e6ef3000000000fe52c49000000003ef209600000000000000000000000005e6e6ef3000000000fa47fa7
  trusted.glusterfs.quota.8bf9ffb4-a671-409a-b28e-d98ba49bec85.contri.2=0x00000000000122000000000000000001
  trusted.pgfid.8bf9ffb4-a671-409a-b28e-d98ba49bec85=0x00000001

I see that "Argument list too long" is sometimes associated with "The maximum size extended attribute the Linux kernel can support is 64kb."

So clearly there is something wrong with this file. What could gluster (or something else) have done that could have corrupted this file?

What is the next step you might propose?

I am also interested in the possibility that the crash was caused by one file, as opposed to a backlog accumulation of changes required to many files. @mohit84 wrote

Any idea why you did create so many xattrs on the backend ?

and I think I may have missunderstood or been distracted by "the backend". What does "backend" refer to? Is it just one file in the .glusterfs/XX/YY directory, or is it a collective term for something else?

xhernandez commented 4 years ago

I see that the file has 70 hardlinks. Gluster keeps special xattrs per hardlink, but they don't use so much space.

If you are using XFS for the bricks, can you execute this command ?

xfs_db -r $(df /srv/brick06/ | awk '{ print $1; }' | tail -1) -c "inode $(ls -i /srv/brick06/.glusterfs/eb/c3/ebc31332-4198-488e-baca-b59b291fbdd7 | awk '{ print $1; }')" -c "p" -c "ablock 0" -c "p" -c "btdump"

This should show all extended attributes.

You can also check the same file in the other server (veriicon) and see if it has good data or not.

xhernandez commented 4 years ago

For now the only way to prevent a crash is to remove that file (assuming there aren't other files in the same condition), but to do so you will also need to identify and delete all hardlinks.

You can do that using this command:

find /srv/brick06/ -inum $(ls -i /srv/brick06/.glusterfs/eb/c3/ebc31332-4198-488e-baca-b59b291fbdd7 | awk '{ print $1; }')

Once listed, verify that the file is healthy in the other server (otherwise you will lose it). If it's ok in the other server, you can delete all entries returned by the find command (or move them elsewhere).

Once the entries are not present in the brick, you can try restarting it and see if it doesn't crash this time. If it crashes, you will need to identify which file is failing now by repeating the steps from previous comments.

Bockeman commented 4 years ago

@xhernandez thanks for your continued support.

I use btrfs rather than xfs.

I have a more pressing problem right now: files are going missing from the gluster view. This is the HA disaster that I thought gluster would avoid. Panic!

I need to take some drastic steps, like shutting down one of the servers, but I don't know how to handle this situation. Grrr.

Bockeman commented 4 years ago

My gluster system has not been at all happy. It looks like files are going missing (as viewed from a client). Consequently clients crashed, locked up, or exhibited undesirable behaviour. All my notes, journalling, scripts etc., are held on gluster, so when gluster goes down things get very difficult.

Fortunately, gluster volume start gluvol1 force has been enough to get some sort of normality, but I am extremely nervous and I would really appreciate your help in bringing my gluster system back to a stable and safe state.

It seems that the brick log file records when something catastrophic has gone wrong, so I searched all the log files

on verijolt
awk '/signal received/{sig=$0;l=1;next}(l){l++}(l==3)&&/^20/{printf("  %s %s  %s\n",$0,sig,FILENAME)} \
' /var/log/glusterfs/bricks/srv-brick??.log
  2020-10-22 12:16:28 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-22 14:09:37 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-22 14:21:23 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-22 14:24:19 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-22 14:27:16 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-22 14:39:00 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-22 16:25:11 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-22 17:04:34 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-23 00:25:06 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-23 00:29:13 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-23 22:55:11 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-23 23:11:06 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-24 11:47:37 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-23 22:00:52 signal received: 11  /var/log/glusterfs/bricks/srv-brick07.log
on veriicon
  2020-10-23 22:00:50 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-23 23:06:03 signal received: 11  /var/log/glusterfs/bricks/srv-brick06.log
  2020-10-23 22:00:53 signal received: 11  /var/log/glusterfs/bricks/srv-brick07.log

As you can see, brick06 and brick07 on both veriicon and verijolt (the two replica 2 servers) have been affected.

The failure mechanism appears to be different from the huge xattr problem analysed above, so I am adding gdb output from each of the servers.

Bockeman commented 4 years ago

Failure on veriicon brick06

``` GNU gdb (GDB) Fedora 9.1-6.fc32 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/glusterfsd... Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd-7.8-1.fc32.x86_64.debug... [New LWP 2355323] [New LWP 2355276] [New LWP 2355277] [New LWP 2355278] [New LWP 2355282] [New LWP 2355285] [New LWP 2355279] [New LWP 2355328] [New LWP 2355289] [New LWP 2355288] [New LWP 2355301] [New LWP 2355284] [New LWP 2355290] [New LWP 2355280] [New LWP 2355283] [New LWP 2355298] [New LWP 2355292] [New LWP 2355294] [New LWP 2355333] [New LWP 2355327] [New LWP 2355295] [New LWP 2355299] [New LWP 2355322] [New LWP 2355326] [New LWP 2356971] [New LWP 2355296] [New LWP 2355281] [New LWP 2355332] [New LWP 2355293] [New LWP 2355300] [New LWP 2355474] [New LWP 2355291] [New LWP 2355325] [New LWP 2355476] [New LWP 2355287] [New LWP 2355286] [New LWP 2355329] [New LWP 2355297] [New LWP 2355331] [New LWP 2355330] [New LWP 2355324] [New LWP 2360751] [New LWP 2355473] [New LWP 2359423] Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". To enable execution of this file add add-auto-load-safe-path /usr/lib64/libthread_db-1.0.so line to your configuration file "/root/.gdbinit". To completely disable this security protection add set auto-load safe-path / --Type for more, q to quit, c to continue without paging--c line to your configuration file "/root/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path" warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Core was generated by `/usr/sbin/glusterfsd -s veriicon --volfile-id gluvol1.veriicon.srv-brick07 -p /'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f517e5c3303 in posix_getxattr (frame=0x7f513021b588, this=0x7f5170008930, loc=, name=, xdata=) at posix-inode-fd-ops.c:3727 3727 list = alloca(size); [Current thread is 1 (LWP 2355323)] Copying output to gdb_ccpp-2020-10-23-23:00:55.980775-2355276.log. Copying debug output to gdb_ccpp-2020-10-23-23:00:55.980775-2355276.log. Missing separate debuginfos, use: dnf debuginfo-install libgcc-10.2.1-1.fc32.x86_64 sssd-client-2.3.1-2.fc32.x86_64 (gdb) thread apply all bt full x/16bx loc->gfid value has been optimized out ```

gdb_ccpp-2020-10-23-23-00-55.980775-2355276.log

Bockeman commented 4 years ago

Failure on verijolt brick06

``` gdb /usr/sbin/glusterfsd -c /var/spool/abrt/ccpp-2020-10-24-12:47:44.957908-93297/coredump \ -ex "set logging file gdb_ccpp-2020-10-24-12:47:44.957908-93297.log" -ex "set logging on" GNU gdb (GDB) Fedora 9.1-6.fc32 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/glusterfsd... Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd-7.8-1.fc32.x86_64.debug... [New LWP 93338] [New LWP 93298] [New LWP 93297] [New LWP 93300] [New LWP 93309] [New LWP 93299] [New LWP 93824] [New LWP 93303] [New LWP 127687] [New LWP 94313] [New LWP 93346] [New LWP 93311] [New LWP 94314] [New LWP 93344] [New LWP 93339] [New LWP 93336] [New LWP 93312] [New LWP 93348] [New LWP 93304] [New LWP 93313] [New LWP 93315] [New LWP 93316] [New LWP 93317] [New LWP 94528] [New LWP 93302] [New LWP 93826] [New LWP 93307] [New LWP 93305] [New LWP 93345] [New LWP 93341] [New LWP 93329] [New LWP 93347] [New LWP 93343] [New LWP 93306] [New LWP 93308] [New LWP 93310] [New LWP 93314] [New LWP 93301] [New LWP 93827] [New LWP 119154] [New LWP 94315] [New LWP 93342] [New LWP 93337] Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". To enable execution of this file add add-auto-load-safe-path /usr/lib64/libthread_db-1.0.so line to your configuration file "/root/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/root/.gdbinit". For more information about this security protection see the --Type for more, q to quit, c to continue without paging-- "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path" warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Core was generated by `/usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick06 -p /'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:33 33 cmp %RAX_LP, (%rdx) [Current thread is 1 (LWP 93338)] thread apply all bt full x/16bx loc->gfid No symbol "loc" in current context. ```

gdb_ccpp-2020-10-24-12-47-44.957908-93297.log

Bockeman commented 4 years ago

For now the only way to prevent a crash is to remove that file (assuming there aren't other files in the same condition), but to do so you will also need to identify and delete all hardlinks.

There are 70 hardlinks on the offending file:

getfattr /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c \
2>&1 | awk '{print "  " $0}'; date +\ \ %F\ %T%n
  /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c: Argument list too long
  2020-10-24 09:52:56
ls -li /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c \
2>&1 | awk '{print "  " $0}'; date +\ \ %F\ %T%n
  12698422 -rwxr-xr-x 70 bobw root 43571 2019-08-02 09:15 /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c
  2020-10-24 09:55:09

export start_date=`date +%F\ %T`; \
find /srv/brick06 -inum  12698422 \
2>&1 | awk '{print "  " $0}'; echo "  ${start_date}" && date +\ \ %F\ %T%n
  /srv/brick06/.glusterfs/eb/c3/ebc3f606-bc70-4e67-b3c0-cd5e9501ae2c
  ...
  2020-10-24 09:58:25
  2020-10-24 10:16:56

Notice it took nearly 20 minutes to find them!

This "Argument list too long" response to getfattr is also present on the equivalent file on veriicon.

I elected to delete these 70 files on both servers. I then went through a process of gluster volume start gluvol1 force and rebooting each server in turn. I now have a working gluster system with all bricks online. And I have copied back the 70 files that I had deleted.

Unsurprisingly, there is a massive healing backlog:

  # Healing
  server        brick:  00     01     02     03     04     05     06     07
  veriicon    pending:   0      0      0      0      0      0  80283      0
  verijolt    pending:   0      0      0      0      0      0    223      0
  veriicon      split:   0      0      0      0      0      0     66      0
  verijolt      split:   0      0      0      0      0      0     64      0
  veriicon    healing:   0      0      0      0      0      0      0      0
  verijolt    healing:   0      0      0      0      0      0      0      0
  2020-10-24 18:21:19

I manually cleared the split-brain issues, and several hours later, most of the self-healing had completed, I am now left with just one:

  # Healing
  server        brick:     00     01     02     03     04     05     06     07
  veriicon    pending:      0      0      0      0      0      0      1      0
  verijolt    pending:      0      0      0      0      0      0      0      0
  veriicon      split:      0      0      0      0      0      0      0      0
  verijolt      split:      0      0      0      0      0      0      0      0
  veriicon    healing:      0      0      0      0      0      0      0      0
  verijolt    healing:      0      0      0      0      0      0      0      0
  2020-10-25 18:06:46

Gluster has been attempting to heal this file for over 24 hours:

[2020-10-24 17:03:51.490279] W [MSGID: 114031] [client-rpc-fops_v2.c:915:client4_0_getxattr_cbk] 2-gluvol1-client-4: remote operation failed. Path: <gfid:3088bdfc-7680-4b7c-8ff3-968c0ae6a572> (3088bdfc-7680-4b7c-8ff3-968c0ae6a572). Key: (null) [Argument list too long]
[2020-10-24 18:46:28.885450] W [MSGID: 114031] [client-rpc-fops_v2.c:915:client4_0_getxattr_cbk] 2-gluvol1-client-4: remote operation failed. Path: <gfid:3088bdfc-7680-4b7c-8ff3-968c0ae6a572> (3088bdfc-7680-4b7c-8ff3-968c0ae6a572). Key: (null) [Argument list too long]
[2020-10-24 18:46:37.022954] W [MSGID: 114031] [client-rpc-fops_v2.c:915:client4_0_getxattr_cbk] 2-gluvol1-client-4: remote operation failed. Path: <gfid:3088bdfc-7680-4b7c-8ff3-968c0ae6a572> (3088bdfc-7680-4b7c-8ff3-968c0ae6a572). Key: (null) [Argument list too long]
[2020-10-24 18:56:43.233457] W [MSGID: 114031] [client-rpc-fops_v2.c:915:client4_0_getxattr_cbk] 2-gluvol1-client-4: remote operation failed. Path: <gfid:3088bdfc-7680-4b7c-8ff3-968c0ae6a572> (3088bdfc-7680-4b7c-8ff3-968c0ae6a572). Key: (null) [Argument list too long]
...
[2020-10-25 17:58:36.353516] I [MSGID: 108026] [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 2-gluvol1-replicate-0: performing metadata selfheal on 3088bdfc-7680-4b7c-8ff3-968c0ae6a572 
[2020-10-25 17:58:36.354143] W [MSGID: 114031] [client-rpc-fops_v2.c:915:client4_0_getxattr_cbk] 2-gluvol1-client-4: remote operation failed. Path: <gfid:3088bdfc-7680-4b7c-8ff3-968c0ae6a572> (3088bdfc-7680-4b7c-8ff3-968c0ae6a572). Key: (null) [Argument list too long]
[2020-10-25 18:08:37.100701] I [MSGID: 108026] [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 2-gluvol1-replicate-0: performing metadata selfheal on 3088bdfc-7680-4b7c-8ff3-968c0ae6a572 
[2020-10-25 18:08:37.101187] W [MSGID: 114031] [client-rpc-fops_v2.c:915:client4_0_getxattr_cbk] 2-gluvol1-client-4: remote operation failed. Path: <gfid:3088bdfc-7680-4b7c-8ff3-968c0ae6a572> (3088bdfc-7680-4b7c-8ff3-968c0ae6a572). Key: (null) [Argument list too long]

I think this is a separate and unrelated bug which I believe is caused by the fact that the file(s) being healed has over 800 hardlinks.

export start_date=`date +%F\ %T`; \
gluster volume heal gluvol1 info \
2>&1 | awk '{print "  " $0}'; echo "  ${start_date}" && date +\ \ %F\ %T
  Brick veriicon:/srv/brick06
  <gfid:3088bdfc-7680-4b7c-8ff3-968c0ae6a572> 
  Status: Connected
  Number of entries: 1
  ...

ls -l   /srv/brick06/.glusterfs/30/88/3088bdfc-7680-4b7c-8ff3-968c0ae6a572
-rwxr-xr-x 809 root root 38099 2019-02-21 08:14 /srv/brick06/.glusterfs/30/88/3088bdfc-7680-4b7c-8ff3-968c0ae6a572

Of course, the reboots (or is it the gluster volume start gluvol1 force ) have caused a massive resigning process, and this is still running:

  # Scanning for unsigned objects
  2020-10-24 16:39:36 00 Crawling brick [/srv/brick00], scanning for unsigned objects 
  2020-10-24 16:39:37 00 Completed crawling brick [/srv/brick00] for      180 objects
  2020-10-24 16:39:38 01 Crawling brick [/srv/brick01], scanning for unsigned objects 
  2020-10-24 16:39:44 01 Completed crawling brick [/srv/brick01] for      607 objects
  2020-10-24 16:39:34 04 Crawling brick [/srv/brick04], scanning for unsigned objects 
  2020-10-24 16:40:01 04 Completed crawling brick [/srv/brick04] for     1364 objects
  2020-10-24 16:39:36 05 Crawling brick [/srv/brick05], scanning for unsigned objects 
  2020-10-25 03:25:37 05 Completed crawling brick [/srv/brick05] for   863130 objects
  2020-10-25 01:00:52 06 Crawling brick [/srv/brick06], scanning for unsigned objects 
  ...                 06 Triggering signing                      for  2414900 objects
  2020-10-24 16:39:38 07 Crawling brick [/srv/brick07], scanning for unsigned objects 
  ...                 07 Triggering signing                      for  1729981 objects
  2020-10-25 17:59:36

Although I have enough of my gluster system to continue normal usage, I feel it is somewhat compromised.

Bockeman commented 4 years ago

Would it be possible summarise what went wrong that essentially made gluster unstable?


I believe in, support and use open source software. Because of peer review of source code, user feedback and so on, such software should become better than commercial equivalents. Hence I report bugs in the interest of making open source products the best and to prevent others being affected by such bugs. I am always grateful to coders/contributers who analyse and fix bugs.

mohit84 commented 4 years ago

To prevent recurrence of an issue you can configure the option "storage.max-hardlinks" to least value sothat client wont be able to create a hardlink if limit has been crossed.

amarts commented 4 years ago

To prevent recurrence of an issue you can configure the option "storage.max-hardlinks" to least value sothat client wont be able to create a hardlink if limit has been crossed.

Should we have a default value for this option? Say 42 (ie, a sane random). That way, we can prevent a bad experience which Bockerman got into by prompting an error to application much earlier. After that, they can decide if the value needs to be increased or not depending on their usecase.

My suggestion is however restrictive, we should keep default options which prevents any borderline issues like this, and makes sure glusterfs provides good performance, and stability. Users can alter the options only knowing what their usecase is, and that should be allowed, as they will be responsible for that particular usecase.

pranithk commented 4 years ago

To prevent recurrence of an issue you can configure the option "storage.max-hardlinks" to least value sothat client wont be able to create a hardlink if limit has been crossed.

Won't that make the application unusable on glusterfs?

xhernandez commented 4 years ago

I don't see the value of this option. If we set it to a lower value, the application creating hardlinks will fail. If the user increases the value (because they are actually creating more hardlinks) we'll have a crash. And even worse, self-heal is unable to heal those files because the size of the xattrs is too big.

What we need to do in this case is to disable gfid2path and any other feature that requires per-hardlink data (besides fixing the crash, of course). Even if we fix the crash and make it possible to handle hundreds of xattrs, it will be very bad from a performance point of view.

mohit84 commented 4 years ago

The current default value is 100 and in xfs i have tried to create 100 hardlink i am not able to create more than 47 hard-link.After reaching the number of hardlink count is 47 setxattr throwing an error "No space left on device".

{.key = {"max-hardlinks"}, .type = GF_OPTION_TYPE_INT, .min = 0, .default_value = "100", .op_version = {GD_OP_VERSION_4_0_0}, .flags = OPT_FLAG_SETTABLE | OPT_FLAG_DOC, .tags = {"posix"}, .validate = GF_OPT_VALIDATE_MIN, .description = "max number of hardlinks allowed on any one inode.\n" "0 is unlimited, 1 prevents any hardlinking at all."},

I think we need to restrict the maximum value of max-hardlinks, i don;t think after restrict/confifure the max-hardlink application won't be able to use glusterfs.

xhernandez commented 4 years ago

Would it be possible summarise what went wrong that essentially made gluster unstable?

The segmentation fault happens because we use a stack allocated buffer to store the the contents of the xattrs. This is done in two steps: first we get the needed size, and then we allocate a buffer of that size to store the data. The problem happens because of the combination of 2 things:

  1. The needed size is not checked for a maximum value, we allow any size.
  2. The size of the stack is 256 KiB

This causes a segmentation fault when trying to allocate more space than available from stack.

  • Is there a set of preventative measures that could be adopted to prevent reoccurance?

In your particular case I would recommend to disable gfid2path feature. You also seem to be using quota. Quota works on a per-directory basis, but given you have multiple hardlinks, I'm not sure if it makes sense (to which directory the quota should be accounted for ?). If not strictly necessary, I would also disable quota.

This doesn't fix the existing issues. Disabling gfid2path will prevent creating additional xattrs for newer files of new hardlinks, but it won't delete existing ones. We should sweep all the files of each brick and remove them. However standard tools (getfattr) doesn't seem to support big xatts either, so I'm not sure how to do that unless btrfs has some specific tool (I don't know btrfs)

  • Are there more checks and safeguards that could/should be implemented in the code?

Sure. We'll need to improve that in current code, at least to avoid a crash and return an error if size of too big.

  • Please could you suggest some steps that could be takeen to sanitse and discover things likely to cause failure including

As I've just commented, we should disable dome features and clean existing xattrs, but I don't know how to do that on btrfs if getfattr doesn't work.

  • files that exist on more than one brick

You are using a replica. It's expected to have the same file in more than one brick.

  • files with properties that break current limits (too many hardlinks, too many attributes, ...)

Probably any file that returns an error for getfattr will also have issues in Gluster.

  • dangling gfid files, where the named file has been deleted directly from a brick, but not the corresponding gfid file

You should never do this. It can cause more troubles.

To find them, this command should work:

find <brick root>/.glusterfs/ -type f -links 1

Any file returned inside /.glusterfs/\<xx>/\<yy>/ with a single link could be removed (be careful to not do this when the volume has load. Otherwise find could incorrectly detect files that are still being created but have not fully completed).

This won't find symbolic links that have been deleted.

xhernandez commented 4 years ago

The current default value is 100 and in xfs i have tried to create 100 hardlink i am not able to create more than 47 hard-link.After reaching the number of hardlink count is 47 setxattr throwing an error "No space left on device".

This is caused because you are using XFS and it limits the xattr size to 64KiB. It's not a limitation on the number of hardlinks. XFS and Gluster can create many more. But when we also add an xattr for each hardlink, the limit becomes the xattr size.

Apparently btrfs doesn't have a 64 KiB limit. That's why the issue has happened without detecting any error.

I think we need to restrict the maximum value of max-hardlinks, i don;t think after restrict/confifure the max-hardlink application won't be able to use glusterfs.

If the application needs to create 100 hardlinks, it won't work on Gluster if we don't allow more than 47. So the application won't be usable on Gluster.

mohit84 commented 4 years ago

The current default value is 100 and in xfs i have tried to create 100 hardlink i am not able to create more than 47 hard-link.After reaching the number of hardlink count is 47 setxattr throwing an error "No space left on device".

This is caused because you are using XFS and it limits the xattr size to 64KiB. It's not a limitation on the number of hardlinks. XFS and Gluster can create many more. But when we also add an xattr for each hardlink, the limit becomes the xattr size.

Apparently btrfs doesn't have a 64 KiB limit. That's why the issue has happened without detecting any error.

I think we need to restrict the maximum value of max-hardlinks, i don;t think after restrict/confifure the max-hardlink application won't be able to use glusterfs.

If the application needs to create 100 hardlinks, it won't work on Gluster if we don't allow more than 47. So the application won't be usable on Gluster.

Thanks for clarifying it more.

Bockeman commented 4 years ago

Much appreciate all comments above from @mohit84 @xhernandez @pranithk and @amarts. I have been beset with issues, most of which are off topic, but including local disks filling up (e.g. with 10G bitd.log). I intend to digest and respond to all comments.

Bockeman commented 4 years ago

@mohit84 wrote

To prevent recurrence of an issue you can configure the option "storage.max-hardlinks" to least value sothat client wont be able to create a hardlink if limit has been crossed.

More background info: In addition to using gluster to provide HA SAN storage for a variety of purposes including cloud storage using Nextcloud, there was a requirement to move the contents from several years of rsnapshot onto one of the volumes.

Prior to attempting to restore the rsnapshot

gluster volume get gluvol1 storage.max-hardlinks \
2>&1 | awk '{print "  " $0}'; date +\ \ %F\ %T
  Option                                  Value                                   
  ------                                  -----                                   
  storage.max-hardlinks                   10000000                                

I had set max-hardlinks to an aggressively high number, based on the number of files to be transferred (i.e. 10,000,000 being more than 5,313,170) because, at the time I did not know the maximum number of hard links actually present (9,774).

Before the crash (brick06 offline, the topic of this issue) all the contents had been transferred onto gluster, apparently successfully because checksums matched the source. However, transferring this data took place over several weeks, not all in one go, and consequently not all of the hardlinks were preserved (not a requirement).

I am still checking that the data on the gluster target volume matches the source. So far I have not found any discrepancy (apart from differences in permissions, ownership and timestamps). So I am assuming that, in general, gluster is handling inodes with over 1,000 hard links. However, some operations, like healing one inode/file with 908 hardlinks is stuck.

Am I right to assume that "storage.max-hardlinks" being too low is not the cause of the problem and that having a higher value does nothing to prevent recurrence of the issue?

Bockeman commented 4 years ago

@amarts wrote

To prevent recurrence of an issue you can configure the option "storage.max-hardlinks" to least value sothat client wont be able to create a hardlink if limit has been crossed.

Should we have a default value for this option? Say 42 (ie, a sane random). That way, we can prevent a bad experience which Bockerman got into by prompting an error to application much earlier. After that, they can decide if the value needs to be increased or not depending on their usecase.

My suggestion is however restrictive, we should keep default options which prevents any borderline issues like this, and makes sure glusterfs provides good performance, and stability. Users can alter the options only knowing what their usecase is, and that should be allowed, as they will be responsible for that particular usecase.

I agree, my data set is unusual, and I do not mind having to set an option to override any default. However, I would like to be assured, that whatever limits there are, they are handled gracefully such that the user/admin can make adjustments before the risk of any corruption.

@pranithk wrote

Won't that make the application unusable on glusterfs?

I'm not sure what "application" you are imagining. Gluster is providing "storage", and if some of that storage contains backup or snapshot data, any user can read that, restore it to own user area, and run whatever application is desired.

My 11TB of data, some of which has more than 100 hardlinks per file/inode, appears to be usable.