gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.76k stars 1.08k forks source link

One brick offline with signal received: 11 #1699

Closed Bockeman closed 4 years ago

Bockeman commented 4 years ago

Description of problem: One brick on one server is offline and all attempts to bring it back online have failed. The corresponding brick on the other (of a replica 2) server is ok. Other bricks are ok.

The following do not clear the problem:

The problem appears to be similar to https://github.com/gluster/glusterfs/issues/1531 but the cause is different, and the number of volumes and bricks is different. (I note the observation comment regarding "replica 2" and split-brain, but the cost (time/effort) to recover from split-brain is manageable and usually due to external causes, such as a power cut.)

My urgent need is to find a way out of the current situation and bring back online brick06 on the second server. Not so urgent is the need for gluster to handle this condition in a graceful way and report to the user/admin what is the real cause of the problem and how to fix it (if it cannot be fixed automatically).

The exact command to reproduce the issue: Not sure what actually caused this situation to arise, but activity at the time was: Multiple clients, all active, but with minimal activity. Intense activity from one client (actually one of the two gluster servers), scripted "chown" on over a million files which had been running for over 5 hours and was 83% complete. edit or "sed -i" on a 500MB script file (but should not have tipped over the 22GB Mem + 8 GB Swap)

The full output of the command that failed:

``` gluster volume status gluvol1 \ 2>&1 | awk '{print " " $0}'; date +\ \ %F\ %T%n Status of volume: gluvol1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick veriicon:/srv/brick06 49162 0 Y 2065834 Brick verijolt:/srv/brick06 N/A N/A N N/A Brick veriicon:/srv/brick05 49163 0 Y 2065859 Brick verijolt:/srv/brick05 49161 0 Y 4775 Brick veriicon:/srv/brick07 49164 0 Y 2065887 Brick verijolt:/srv/brick07 49162 0 Y 4797 Self-heal Daemon on localhost N/A N/A Y 1969 Quota Daemon on localhost N/A N/A Y 4867 Bitrot Daemon on localhost N/A N/A Y 4882 Scrubber Daemon on localhost N/A N/A Y 4938 Self-heal Daemon on veriicon N/A N/A Y 2063499 Quota Daemon on veriicon N/A N/A Y 2304107 Bitrot Daemon on veriicon N/A N/A Y 2304118 Scrubber Daemon on veriicon N/A N/A Y 2304144 Task Status of Volume gluvol1 ------------------------------------------------------------------------------ There are no active volume tasks 2020-10-22 23:35:11 ```

Expected results: Some way to bring that brick back online.

- The output of the gluster volume info command:

``` gluster volume info gluvol1 \ 2>&1 | awk '{print " " $0}'; date +\ \ %F\ %T%n Volume Name: gluvol1 Type: Distributed-Replicate Volume ID: 5af1e1c9-afbd-493d-a567-d7989cf3b9ea Status: Started Snapshot Count: 0 Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: veriicon:/srv/brick06 Brick2: verijolt:/srv/brick06 Brick3: veriicon:/srv/brick05 Brick4: verijolt:/srv/brick05 Brick5: veriicon:/srv/brick07 Brick6: verijolt:/srv/brick07 Options Reconfigured: storage.max-hardlinks: 10000000 features.quota-deem-statfs: on features.scrub-freq: monthly features.inode-quota: on features.quota: on features.scrub: Active features.bitrot: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.self-heal-daemon: on cluster.min-free-inodes: 5% 2020-10-22 23:34:01 ```

- The operating system / glusterfs version: Fedora F32 Linux veriicon 5.8.15-201.fc32.x86_64 #1 SMP Thu Oct 15 15:56:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Linux verijolt 5.8.15-201.fc32.x86_64 #1 SMP Thu Oct 15 15:56:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux glusterfs 7.8

Additional info:

on veriicon (1st server) ``` systemctl status glusterd o glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2020-10-18 11:20:39 BST; 4 days ago Docs: man:glusterd(8) Process: 833 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 838 (glusterd) Tasks: 314 (limit: 23818) Memory: 11.9G CPU: 2d 13h 26min 1.990s CGroup: /system.slice/glusterd.service 838 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO 2063406 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol0.veriicon.srv-brick04 -p /var/run/gluster/vols/gluvol0/veriicon-srv-brick> 2063429 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol0.veriicon.srv-brick00 -p /var/run/gluster/vols/gluvol0/veriicon-srv-brick> 2063452 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol0.veriicon.srv-brick01 -p /var/run/gluster/vols/gluvol0/veriicon-srv-brick> 2063499 /usr/sbin/glusterfs -s localhost --volfile-id shd/gluvol0 -p /var/run/gluster/shd/gluvol0/gluvol0-shd.pid -l /var/log/glusterf> 2065834 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol1.veriicon.srv-brick06 -p /var/run/gluster/vols/gluvol1/veriicon-srv-brick> 2065859 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol1.veriicon.srv-brick05 -p /var/run/gluster/vols/gluvol1/veriicon-srv-brick> 2065887 /usr/sbin/glusterfsd -s veriicon --volfile-id gluvol1.veriicon.srv-brick07 -p /var/run/gluster/vols/gluvol1/veriicon-srv-brick> 2304107 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quota> 2304118 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S> 2304144 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/log/glusterfs/scrub.lo> ``` on verijolt (2nd server) ``` systemctl status glusterd o glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2020-10-22 18:00:21 BST; 21min ago Docs: man:glusterd(8) Process: 823 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 847 (glusterd) Tasks: 262 (limit: 26239) Memory: 3.4G CPU: 49min 14.474s CGroup: /system.slice/glusterd.service 847 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO 1701 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/gluvol1 --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *dht.readdir-opti> 1969 /usr/sbin/glusterfs -s localhost --volfile-id shd/gluvol0 -p /var/run/gluster/shd/gluvol0/gluvol0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/361019a1040168a1.socket --xlator-option *replicate*.node-uu> 4575 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol0.verijolt.srv-brick04 -p /var/run/gluster/vols/gluvol0/verijolt-srv-brick04.pid -S /var/run/gluster/106ca420dff7312a.socket --brick-name /srv/brick04 -l /var/log/gluster> 4598 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol0.verijolt.srv-brick00 -p /var/run/gluster/vols/gluvol0/verijolt-srv-brick00.pid -S /var/run/gluster/9978a7a4ac3bc9dd.socket --brick-name /srv/brick00 -l /var/log/gluster> 4621 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol0.verijolt.srv-brick01 -p /var/run/gluster/vols/gluvol0/verijolt-srv-brick01.pid -S /var/run/gluster/c64e4259882d3f4d.socket --brick-name /srv/brick01 -l /var/log/gluster> 4775 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick05 -p /var/run/gluster/vols/gluvol1/verijolt-srv-brick05.pid -S /var/run/gluster/0135293616f7a351.socket --brick-name /srv/brick05 -l /var/log/gluster> 4797 /usr/sbin/glusterfsd -s verijolt --volfile-id gluvol1.verijolt.srv-brick07 -p /var/run/gluster/vols/gluvol1/verijolt-srv-brick07.pid -S /var/run/gluster/b7d8c6bd3c8b2992.socket --brick-name /srv/brick07 -l /var/log/gluster> 4867 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/739fa467313dc700.socket --process-name quotad 4882 /usr/sbin/glusterfs -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l /var/log/glusterfs/bitd.log -S /var/run/gluster/a646064795a52ac2.socket --global-timer-wheel 4938 /usr/sbin/glusterfs -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid -l /var/log/glusterfs/scrub.log -S /var/run/gluster/903362075a5b27cc.socket --global-timer-wheel Oct 22 18:04:34 verijolt srv-brick06[4753]: dlfcn 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: libpthread 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: llistxattr 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: setfsid 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: spinlock 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: epoll.h 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: xattr.h 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: st_atim.tv_nsec 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: package-string: glusterfs 7.8 Oct 22 18:04:34 verijolt srv-brick06[4753]: --------- ``` There's a problem with brick06 on server verijolt.

snippet from /var/log/messages

``` Oct 22 18:03:13 verijolt systemd[2266]: Finished Mark boot as successful. Oct 22 18:04:34 verijolt srv-brick06[4753]: pending frames: Oct 22 18:04:34 verijolt audit[4753]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=4753 comm="glfs_iotwr002" exe="/usr/sbin/glusterfsd" sig=11 res=1 Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(0) op(0) Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(0) op(0) Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(GETXATTR) Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(LOOKUP) Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(LOOKUP) Oct 22 18:04:34 verijolt srv-brick06[4753]: frame : type(1) op(LOOKUP) Oct 22 18:04:34 verijolt srv-brick06[4753]: patchset: git://git.gluster.org/glusterfs.git Oct 22 18:04:34 verijolt srv-brick06[4753]: signal received: 11 Oct 22 18:04:34 verijolt srv-brick06[4753]: time of crash: Oct 22 18:04:34 verijolt srv-brick06[4753]: 2020-10-22 17:04:34 Oct 22 18:04:34 verijolt srv-brick06[4753]: configuration details: Oct 22 18:04:34 verijolt srv-brick06[4753]: argp 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: backtrace 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: dlfcn 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: libpthread 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: llistxattr 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: setfsid 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: spinlock 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: epoll.h 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: xattr.h 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: st_atim.tv_nsec 1 Oct 22 18:04:34 verijolt srv-brick06[4753]: package-string: glusterfs 7.8 Oct 22 18:04:34 verijolt srv-brick06[4753]: --------- Oct 22 18:04:34 verijolt systemd[1]: Created slice system-systemd\x2dcoredump.slice. Oct 22 18:04:34 verijolt audit: BPF prog-id=45 op=LOAD Oct 22 18:04:34 verijolt audit: BPF prog-id=46 op=LOAD Oct 22 18:04:34 verijolt audit: BPF prog-id=47 op=LOAD Oct 22 18:04:34 verijolt systemd[1]: Started Process Core Dump (PID 4906/UID 0). Oct 22 18:04:34 verijolt audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4906-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Oct 22 18:04:34 verijolt systemd-coredump[4907]: Process 4753 (glusterfsd) of user 0 dumped core.#012#012Stack trace of thread 4845:#012#0 0x00007f812ba19729 posix_get_ancestry_non_directory (posix.so + 0x31729)#012#1 0x00007f812ba19c9f posix_get_ancestry (posix.so + 0x31c9f)#012#2 0x00007f812ba22c20 posix_readdirp (posix.so + 0x3ac20)#012#3 0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#4 0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#5 0x00007f812b92f929 br_stub_readdirp (bitrot-stub.so + 0x9929)#012#6 0x00007f812b9187d2 posix_acl_readdirp (access-control.so + 0x77d2)#012#7 0x00007f812b8d76d0 pl_readdirp (locks.so + 0xd6d0)#012#8 0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#9 0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#10 0x00007f8130cad90b default_readdirp (libglusterfs.so.0 + 0xbb90b)#012#11 0x00007f812b8861f1 up_readdirp (upcall.so + 0xd1f1)#012#12 0x00007f8130cc61bd default_readdirp_resume (libglusterfs.so.0 + 0xd41bd)#012#13 0x00007f8130c44035 call_resume (libglusterfs.so.0 + 0x52035)#012#14 0x00007f812b86e128 iot_worker (io-threads.so + 0x7128)#012#15 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#16 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4755:#012#0 0x00007f81309ab962 __sigtimedwait (libc.so.6 + 0x3d962)#012#1 0x00007f8130b4e5bc sigwait (libpthread.so.0 + 0x145bc)#012#2 0x000055ca5c22081b glusterfs_sigwaiter (glusterfsd + 0x981b)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4759:#012#0 0x00007f8130a66fcb __select (libc.so.6 + 0xf8fcb)#012#1 0x00007f8130c95b79 runner (libglusterfs.so.0 + 0xa3b79)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4756:#012#0 0x00007f8130a36801 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc8801)#012#1 0x00007f8130a3c157 __nanosleep (libc.so.6 + 0xce157)#012#2 0x00007f8130a3c08e sleep (libc.so.6 + 0xce08e)#012#3 0x00007f8130c470e5 pool_sweeper (libglusterfs.so.0 + 0x550e5)#012#4 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#5 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4760:#012#0 0x00007f8130a6fc5e epoll_wait (libc.so.6 + 0x101c5e)#012#1 0x00007f8130c816a2 event_dispatch_epoll_worker (libglusterfs.so.0 + 0x8f6a2)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4761:#012#0 0x00007f8130a6fc5e epoll_wait (libc.so.6 + 0x101c5e)#012#1 0x00007f8130c816a2 event_dispatch_epoll_worker (libglusterfs.so.0 + 0x8f6a2)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4770:#012#0 0x00007f8130a36801 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc8801)#012#1 0x00007f8130a3c157 __nanosleep (libc.so.6 + 0xce157)#012#2 0x00007f8130a3c08e sleep (libc.so.6 + 0xce08e)#012#3 0x00007f812b9f57b0 posix_disk_space_check_thread_proc (posix.so + 0xd7b0)#012#4 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#5 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4771:#012#0 0x00007f8130a36801 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc8801)#012#1 0x00007f8130a3c157 __nanosleep (libc.so.6 + 0xce157)#012#2 0x00007f8130a3c08e sleep (libc.so.6 + 0xce08e)#012#3 0x00007f812b9f5156 posix_health_check_thread_proc (posix.so + 0xd156)#012#4 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#5 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4767:#012#0 0x00007f8130a66fcb __select (libc.so.6 + 0xf8fcb)#012#1 0x00007f812b97d402 changelog_ev_dispatch (changelog.so + 0x1c402)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4842:#012#0 0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1 0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4772:#012#0 0x00007f8130b4a1b8 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x101b8)#012#1 0x00007f812b9f0114 posix_ctx_janitor_thread_proc (posix.so + 0x8114)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4850:#012#0 0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1 0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4852:#012#0 0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1 0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4773:#012#0 0x00007f8130b49e92 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe92)#012#1 0x00007f812b9f59bb posix_fsyncer_pick (posix.so + 0xd9bb)#012#2 0x00007f812b9f5c25 posix_fsyncer (posix.so + 0xdc25)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4858:#012#0 0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1 0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4860:#012#0 0x00007f8130b4d750 __lll_lock_wait (libpthread.so.0 + 0x13750)#012#1 0x00007f8130b45e53 __pthread_mutex_lock (libpthread.so.0 + 0xbe53)#012#2 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4857:#012#0 0x00007f8130b45da3 __pthread_mutex_lock (libpthread.so.0 + 0xbda3)#012#1 0x00007f812b86e028 iot_worker (io-threads.so + 0x7028)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4757:#012#0 0x00007f8130b4a1b8 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x101b8)#012#1 0x00007f8130c5d3d9 syncenv_task (libglusterfs.so.0 + 0x6b3d9)#012#2 0x00007f8130c5e1a5 syncenv_processor (libglusterfs.so.0 + 0x6c1a5)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4758:#012#0 0x00007f8130b4a1b8 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x101b8)#012#1 0x00007f8130c5d3d9 syncenv_task (libglusterfs.so.0 + 0x6b3d9)#012#2 0x00007f8130c5e1a5 syncenv_processor (libglusterfs.so.0 + 0x6c1a5)#012#3 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#4 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4762:#012#0 0x00007f8130b49e92 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe92)#012#1 0x00007f812b81d994 index_worker (index.so + 0x7994)#012#2 0x00007f8130b43432 start_thread (libpthread.so.0 + 0x9432)#012#3 0x00007f8130a6f913 __clone (libc.so.6 + 0x101913)#012#012Stack trace of thread 4768:#012#0 0x00007f8130a66fcb __select (libc.so.6 + 0xf8fcb)#012#1 0x00007f812b97d402 changelog_ev_dispatch (changelog.so + 0x1c402 Oct 22 18:04:34 verijolt audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4906-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Oct 22 18:04:34 verijolt audit: BPF prog-id=47 op=UNLOAD Oct 22 18:04:34 verijolt audit: BPF prog-id=46 op=UNLOAD Oct 22 18:04:34 verijolt audit: BPF prog-id=45 op=UNLOAD Oct 22 18:04:34 verijolt systemd[1]: systemd-coredump@0-4906-0.service: Succeeded. Oct 22 18:04:37 verijolt abrt-server[4960]: Deleting problem directory ccpp-2020-10-22-18:04:35.4544-4753 (dup of ccpp-2020-10-22-14:23:50.21866-346125) Oct 22 18:04:38 verijolt abrt-notification[5020]: Process 346125 (glusterfsd) crashed in posix_get_ancestry_non_directory() ```

snippet from /var/log/glusterfs/bricks/srv-brick06.log

``` [2020-10-22 17:04:34.150577] E [inodelk.c:506:__inode_unlock_lock] 0-gluvol1-locks: Matching lock not found for unlock 0-9223372036854775807, by 181b0088357f0000 on 0x7f8124011df0 [2020-10-22 17:04:34.150614] E [MSGID: 115053] [server-rpc-fops_v2.c:271:server4_inodelk_cbk] 0-gluvol1-server: 55: INODELK (73a42b0e-64d1-4606-8337-fce48490a10b), client: CTX_ID:2784eca2-176b-4f8a-b0b8-0a81d4e811b3-GRAPH_ID:0-PID:1701-HOST:verijolt-PC_NAME:gluvol1-client-5-RECON_NO:-0, error-xlator: gluvol1-locks [Invalid argument] pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2020-10-22 17:04:34 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 7.8 /lib64/libglusterfs.so.0(+0x2afa4)[0x7f8130c1cfa4] /lib64/libglusterfs.so.0(gf_print_trace+0x333)[0x7f8130c27c93] /lib64/libc.so.6(+0x3ca70)[0x7f81309aaa70] /usr/lib64/glusterfs/7.8/xlator/storage/posix.so(+0x31729)[0x7f812ba19729] /usr/lib64/glusterfs/7.8/xlator/storage/posix.so(+0x31c9f)[0x7f812ba19c9f] /usr/lib64/glusterfs/7.8/xlator/storage/posix.so(+0x3ac20)[0x7f812ba22c20] /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b] /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b] /usr/lib64/glusterfs/7.8/xlator/features/bitrot-stub.so(+0x9929)[0x7f812b92f929] /usr/lib64/glusterfs/7.8/xlator/features/access-control.so(+0x77d2)[0x7f812b9187d2] /usr/lib64/glusterfs/7.8/xlator/features/locks.so(+0xd6d0)[0x7f812b8d76d0] /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b] /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b] /lib64/libglusterfs.so.0(default_readdirp+0xdb)[0x7f8130cad90b] /usr/lib64/glusterfs/7.8/xlator/features/upcall.so(+0xd1f1)[0x7f812b8861f1] /lib64/libglusterfs.so.0(default_readdirp_resume+0x21d)[0x7f8130cc61bd] /lib64/libglusterfs.so.0(call_resume+0x75)[0x7f8130c44035] /usr/lib64/glusterfs/7.8/xlator/performance/io-threads.so(+0x7128)[0x7f812b86e128] /lib64/libpthread.so.0(+0x9432)[0x7f8130b43432] /lib64/libc.so.6(clone+0x43)[0x7f8130a6f913] --------- ```
Bockeman commented 4 years ago

@xhernandez Thanks for your detailed response.

In your particular case I would recommend to disable gfid2path feature. You also seem to be using quota. Quota works on a per-directory basis, but given you have multiple hardlinks, I'm not sure if it makes sense (to which directory the quota should be accounted for ?). If not strictly necessary, I would also disable quota.

Please could you tell me how to disable the gfid2path feature, I cannot find it in the documentation.

I disabled quota.

I deleted all 809 files that were causing the [Argument list too long] that self-healing could not handle.

    files that exist on more than one brick

You are using a replica. It's expected to have the same file in more than one brick.

I mean, files that exist on more than one brick for a given server. Cases like this result in

ls -l /srv/gluvol1/vault/bobw/files/home/.mozilla/firefox/6qw6a8eb.cockpit \
2>&1 | awk '/?/{print "  " $0}'; date +\ \ %F\ %T
  -????????? ? ?    ?             ?                ? broadcast-listeners.json
  -????????? ? ?    ?             ?                ? prefs.js
  2020-10-05 16:17:28
    dangling gfid files, where the named file has been deleted directly from a brick, but not the corresponding gfid file

You should never do this. It can cause more troubles.

Obviously, I do not intentionally delete files directly from bricks, but I have found this is often the only way to resolve certain issues (like split-brain), but it is always possible with manual intervention like this that I could make a mistake.

I will attempt to getfattr and hardlink count for all files on all bricks, but I will need to be careful how I do that. There's no point on getfattr for each file, it only needs to be done once per inode. Given a "find" on a 11TB subset of all the data takes over 5 hours, this could take days.

mohit84 commented 4 years ago

I don't believe the issue is only specific to hardlink. gluster populates a xattr per hardlink basis, unless we don't know about the xattr populated on the backend it is difficult to find the reason. As xavi asked earlier to share about the xatts but getfattr is failing due to throwing an error "Argument list too long" so it is difficult, i am not sure if btrsfs provide some tool to fetch this info.

In gluster default value of storage.max-hardlinks is 100 , unless you have not changed the value you can't create more than 100 hardlinks so I am not sure issue is only specific to hardlinks. As you said quota is enabled , quota also populates some xattr but not too much. I am not sure about an application if it has created many custom xattrs on backend.

For the time-being you can disable storage.gfid2path as like below after disable this application can create a hardlink and gluster won't populate a new xattr(gfid2path) for every hardlink, after disable it we can restrict only gfid2path xattr but we can't restrict if application is trying to populate custom xattr on file. gluster v set / storage.gfid2path off

For specific to brick crash we have to fix the code path, we need to call MALLOC instead of calling alloca in case if xattrsize is greater than some limit like(64k/128k).

Bockeman commented 4 years ago

Disaster struck, see https://github.com/gluster/glusterfs/issues/1729 and https://github.com/gluster/glusterfs/issues/1728. I have now recovered lost data and am able to resume analysis on my gluster data.

I discovered a large number of files with silly permissions, and reported that on https://github.com/gluster/glusterfs/issues/1731

I am nervous about the integrity of my files, any suggestions welcome.

I am continuing with:

I will attempt to getfattr and hardlink count for all files on all bricks, but I will need to be careful how I do that. There's no point on getfattr for each file, it only needs to be done once per inode. Given a "find" on a 11TB subset of all the data takes over 5 hours, this could take days.

Bockeman commented 4 years ago

Perhaps someone could explain the value of hard-link for each file residing under /.glusterfs/XX/YY/.

Also, why is there a mixture of hard-links and symbolic-links?

           1584                63673  1 drwx------ root     root     20-08-21 15:06:34.6387425020 brick07/.glusterfs/ff/ff
             59              4192010  1 lrwxrwxrwx root     root     20-10-10 20:48:09.9794764880 brick07/.glusterfs/ff/fe/fffefb07-795e-4a27-90da-6db78c897c92
             58              3507907  1 lrwxrwxrwx root     root     20-10-08 19:39:32.8287566040 brick07/.glusterfs/ff/fe/fffedcc4-7293-4fe9-ab34-714a6ba015e5
              0              3555211 17 ---------T bobw     warren   20-10-08 20:48:31.1710559470 brick07/.glusterfs/ff/fe/fffea855-c37e-4691-825f-46caf90e9e28
          47108              1488652  2 -rwxr--r-- bobw     warren   07-11-11 15:07:43.8593750000 brick07/.glusterfs/ff/fe/fffea714-5236-45ee-820d-722fa3332694
            987              1732599  2 -r--r--r-- bobw     warren   20-03-15 19:44:06.5220859100 brick07/.glusterfs/ff/fe/fffea64c-b721-46a7-8d44-5980b3b14f8e
             64               556533  1 lrwxrwxrwx root     root     20-08-21 21:14:42.9929821840 brick07/.glusterfs/ff/fe/fffea161-6b61-41b3-b5ab-1cc17e00d321
              0              1767388  2 -rwxrwxr-x bobw     warren   19-03-10 16:28:47.0000000000 brick07/.glusterfs/ff/fe/fffe8f40-2b58-4ca4-8284-fc869de5cb0c
          32768              1960787  2 -rw-r--r-- apache   apache   20-05-09 21:28:54.0000000000 brick07/.glusterfs/ff/fe/fffe800f-c3ff-40d5-8e37-a7cea0726d87
             71              3766279  1 lrwxrwxrwx root     root     20-10-09 02:17:39.5293480090 brick07/.glusterfs/ff/fe/fffe7192-53d1-4718-bad3-9f9ba764354f
             52              2726222  1 lrwxrwxrwx root     root     20-09-12 21:56:00.1328402300 brick07/.glusterfs/ff/fe/fffe526c-3980-4d1b-85b2-08952c60c80b
              0              3391565  9 ---------T bobw     warren   20-10-08 15:25:04.3622390120 brick07/.glusterfs/ff/fe/fffe3a45-439c-40b4-a29b-d317c4c16fd3
         396400              4500352  2 -rw-r--r-- apache   apache   20-10-12 12:44:13.3103199880 brick07/.glusterfs/ff/fe/fffe2d3b-66fe-459b-99ff-fabf8ef7301f
             57               392951  1 lrwxrwxrwx root     root     20-08-21 18:52:10.6153474860 brick07/.glusterfs/ff/fe/fffe1f2a-6094-43cf-a6d2-5ea4e5a21095
             71              4022210  1 lrwxrwxrwx root     root     20-10-09 09:49:12.9389294170 brick07/.glusterfs/ff/fe/fffe1e86-3b01-45ee-8efa-cd198175b6c7
            825              4783638  2 -rwxrwxrwx bobw     warren   20-10-28 21:30:43.9246942800 brick07/.glusterfs/ff/fe/fffe1327-f179-4cfb-b85b-567a17d7143a
           7553               682312  2 -r--r--r-- bobw     warren   19-08-28 10:40:50.2532174380 brick07/.glusterfs/ff/fe/fffe0b07-bf0c-45dd-8e33-c268fa426d8c
        6119845              1622319  2 -rwxr-xr-x bobw     warren   20-04-19 18:46:56.1955260000 brick07/.glusterfs/ff/fe/fffe084e-4c39-453e-9a2c-50de00a677d2

This means that finding "dangling" gfids (i.e. a gfid file with no corresponding actual file) is more difficult than @xhernandez suggests:

To find them, this command should work:

find <brick root>/.glusterfs/ -type f -links 1

Any file returned inside /.glusterfs/// with a single link could be removed (be careful to not do this when the volume has load. Otherwise find could incorrectly detect files that are still being created but have not fully completed).

xhernandez commented 4 years ago

Perhaps someone could explain the value of hard-link for each file residing under /.glusterfs/XX/YY/.

Also, why is there a mixture of hard-links and symbolic-links?

           1584                63673  1 drwx------ root     root     20-08-21 15:06:34.6387425020 brick07/.glusterfs/ff/ff
             59              4192010  1 lrwxrwxrwx root     root     20-10-10 20:48:09.9794764880 brick07/.glusterfs/ff/fe/fffefb07-795e-4a27-90da-6db78c897c92
             58              3507907  1 lrwxrwxrwx root     root     20-10-08 19:39:32.8287566040 brick07/.glusterfs/ff/fe/fffedcc4-7293-4fe9-ab34-714a6ba015e5
              0              3555211 17 ---------T bobw     warren   20-10-08 20:48:31.1710559470 brick07/.glusterfs/ff/fe/fffea855-c37e-4691-825f-46caf90e9e28
          47108              1488652  2 -rwxr--r-- bobw     warren   07-11-11 15:07:43.8593750000 brick07/.glusterfs/ff/fe/fffea714-5236-45ee-820d-722fa3332694
            987              1732599  2 -r--r--r-- bobw     warren   20-03-15 19:44:06.5220859100 brick07/.glusterfs/ff/fe/fffea64c-b721-46a7-8d44-5980b3b14f8e
             64               556533  1 lrwxrwxrwx root     root     20-08-21 21:14:42.9929821840 brick07/.glusterfs/ff/fe/fffea161-6b61-41b3-b5ab-1cc17e00d321
              0              1767388  2 -rwxrwxr-x bobw     warren   19-03-10 16:28:47.0000000000 brick07/.glusterfs/ff/fe/fffe8f40-2b58-4ca4-8284-fc869de5cb0c
          32768              1960787  2 -rw-r--r-- apache   apache   20-05-09 21:28:54.0000000000 brick07/.glusterfs/ff/fe/fffe800f-c3ff-40d5-8e37-a7cea0726d87
             71              3766279  1 lrwxrwxrwx root     root     20-10-09 02:17:39.5293480090 brick07/.glusterfs/ff/fe/fffe7192-53d1-4718-bad3-9f9ba764354f
             52              2726222  1 lrwxrwxrwx root     root     20-09-12 21:56:00.1328402300 brick07/.glusterfs/ff/fe/fffe526c-3980-4d1b-85b2-08952c60c80b
              0              3391565  9 ---------T bobw     warren   20-10-08 15:25:04.3622390120 brick07/.glusterfs/ff/fe/fffe3a45-439c-40b4-a29b-d317c4c16fd3
         396400              4500352  2 -rw-r--r-- apache   apache   20-10-12 12:44:13.3103199880 brick07/.glusterfs/ff/fe/fffe2d3b-66fe-459b-99ff-fabf8ef7301f
             57               392951  1 lrwxrwxrwx root     root     20-08-21 18:52:10.6153474860 brick07/.glusterfs/ff/fe/fffe1f2a-6094-43cf-a6d2-5ea4e5a21095
             71              4022210  1 lrwxrwxrwx root     root     20-10-09 09:49:12.9389294170 brick07/.glusterfs/ff/fe/fffe1e86-3b01-45ee-8efa-cd198175b6c7
            825              4783638  2 -rwxrwxrwx bobw     warren   20-10-28 21:30:43.9246942800 brick07/.glusterfs/ff/fe/fffe1327-f179-4cfb-b85b-567a17d7143a
           7553               682312  2 -r--r--r-- bobw     warren   19-08-28 10:40:50.2532174380 brick07/.glusterfs/ff/fe/fffe0b07-bf0c-45dd-8e33-c268fa426d8c
        6119845              1622319  2 -rwxr-xr-x bobw     warren   20-04-19 18:46:56.1955260000 brick07/.glusterfs/ff/fe/fffe084e-4c39-453e-9a2c-50de00a677d2

This means that finding "dangling" gfids (i.e. a gfid file with no corresponding actual file) is more difficult than @xhernandez suggests:

To find them, this command should work: find <brick root>/.glusterfs/ -type f -links 1 Any file returned inside /.glusterfs/// with a single link could be removed (be careful to not do this when the volume has load. Otherwise find could incorrectly detect files that are still being created but have not fully completed).

As I already said in my comment, this method won't work fine if you also have symbolic links. The command only finds regular files with a single hardlink. Since gluster keeps a hardlink between the real file and the gfid in .glusterfs/xx/yy, any file inside .glusterfs with a single hardlink means that there's not real file associated.

The symbolic links inside .glusterfs may represent real symbolic link files or directories. To differentiate them is more complex.