Open hunter86bg opened 2 years ago
Gluster version: Master:
hi glusterfs-client 8.5-ubuntu1~bionic1 amd64 clustered file-system (client package)
hi glusterfs-common 8.5-ubuntu1~bionic1 amd64 GlusterFS common libraries and translator modules
hi glusterfs-server 8.5-ubuntu1~bionic1 amd64 clustered file-system (server package)
Slave:
hi glusterfs-client 8.5-ubuntu1~bionic1 amd64 clustered file-system (client package)
hi glusterfs-common 8.5-ubuntu1~bionic1 amd64 GlusterFS common libraries and translator modules
hi glusterfs-server 8.5-ubuntu1~bionic1 amd64 clustered file-system (server package)
Pystack output:
# pystack -v 24892;echo;echo;echo;echo;echo;echo;echo; pystack -v 25678
Standard Output:
[New LWP 24907]
[New LWP 24909]
[New LWP 24977]
[New LWP 24978]
[New LWP 24979]
[New LWP 24980]
[New LWP 24981]
[New LWP 24982]
[New LWP 24983]
[New LWP 24984]
[New LWP 24985]
[New LWP 24986]
[New LWP 24987]
[New LWP 24988]
[New LWP 24989]
[New LWP 24990]
[New LWP 24991]
[New LWP 24992]
[New LWP 24993]
[New LWP 24994]
[New LWP 24995]
[New LWP 24996]
[New LWP 24997]
[New LWP 25027]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f618effce1f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7fff22145870) at ../sysdeps/unix/sysv/linux/select.c:41
$1 = (void *) 0x1
Standard Error:
41 ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
Dumping Threads....
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 445, in keep_alive
time.sleep(gap)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 176, in listen
select((self.inf,), (), ())
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 488, in select
return eintr_wrap(oselect.select, oselect.error, *args)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 480, in eintr_wrap
return func(*args)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 773, in tailer
time.sleep(0.5)
---------------
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in <module>
main()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 317, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 86, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1316, in service_loop
g2.crawlwrap()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 607, in crawlwrap
time.sleep(self.sleep_interval)
File "<string>", line 1, in <module>
File "<string>", line 1, in <module>
Standard Output:
[New LWP 25679]
[New LWP 25715]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f2b54b407c6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x7f2b44000e70) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
$1 = (void *) 0x1
Standard Error:
205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
Dumping Threads....
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 272, in wmon
slave_host, master, suuid, slavenodes)
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 245, in monitor
ret = nwait(cpid)
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 117, in nwait
p2, r = waitpid(p, o)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 492, in waitpid
return eintr_wrap(owaitpid, OSError, *args)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 480, in eintr_wrap
return func(*args)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 769, in tailer
[po.stderr for po in errstore], [], [], 1)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 488, in select
return eintr_wrap(oselect.select, oselect.error, *args)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 480, in eintr_wrap
return func(*args)
---------------
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in <module>
main()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 317, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 60, in subcmd_monitor
return monitor.monitor(local, remote)
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 360, in monitor
return Monitor().multiplex(*distribute(local, remote))
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 296, in multiplex
t.join()
File "/usr/lib/python3.6/threading.py", line 1056, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
File "<string>", line 1, in <module>
File "<string>", line 1, in <module>
@hunter86bg can you please share the brick logs and mnt logs of slave and master at the same time?
Is the issue still present? or is it randomly getting resolved as well?
Usually we force it to move to another node by using iptables to block outgoing traffic to slave nodes. Last time it self resolved and become stuck again /no actions from our side/.
I can say it self resolves rarely.
Edit: It stucks , then self resolved than stucks again. I will check if the issue was "force-resolved" via the iptables method or if it still affects the setup. Edit2: it was "fixed", so we can't debug it right now.
I will provide the logs as soon as I can.
The replication was (at that time) from master2 to slave1.
@Shwetha-Acharya , thanks for taking a look at it. It looks strange when the logs do not provide useful info.
Time of issues (aprox):
# TZ='UTC' date --date='@1638386942' Wed Dec 1 19:29:02 UTC 2021
Nothing significant is seen on the logs. Can you please specify the workload?
# TZ='UTC' date --date='@1638306241' Tue Nov 30 21:04:01 UTC 2021
There were multiple occurances of "No such file or directory" errors around that time.
[2021-11-29 14:24:43.000376] E [MSGID: 117053] [changelog-helpers.c:441:changelog_rollover_changelog] 0-gvol0-changelog: error unlinking empty changelog [{path=/nodirectwritedata/gluster/gvol0/.glusterfs/changelogs/CHANGELOG}, {errno=2}, {error=No such file or directory}]
Either /nodirectwritedata/gluster/gvol0/.glusterfs/changelogs/CHANGELOG was not present or was not accessable for some reason.
Thanks @Shwetha-Acharya ,
I'll monitor that file and see what happens.
@Shwetha-Acharya ,
as you have noticed the first time there was no indication of a problem in the logs (or at least I missed it). Yet, the following puzzles me. Shouldn't all passive nodes have the same content of the CHANGELOG file ?
master1
GlusterFS Changelog | version: v1.2 | encoding : 2
Da9232e02-f536-4ee4-92d9-96559cdf5dcaDea8dd9ce-1e8b-419b-b247-44e9306d96c9Ef545e611-22cf-4dea-bcf6-bf3a66769b16233318810811727651a20-531d-4891-a78e-496804edfd00/voicemail_1638912529_1638912523611980_77028.wavDf545e611-22cf-4dea-bcf6-bf3a66769b16E1d128108-0878-4b1a-880b-69b7c46d38cc2333152108117c01f79a7-22e9-492d-9086-3e36a1fef50f/.lock-99490825E1d128108-0878-4b1a-880b-69b7c46d38cc9c01f79a7-22e9-492d-9086-3e36a1fef50f/.lockE1d128108-0878-4b1a-880b-69b7c46d38cc5c01f79a7-22e9-492d-9086-3e36a1fef50f/.lock-99490825E1d128108-0878-4b1a-880b-69b7c46d38cc5c01f79a7-22e9-492d-9086-3e36a1fef50f/.lockE0d2e3fad-d79b-4b26-b561-6c0f229a91cf2333152108117c01f79a7-22e9-492d-9086-3e36a1fef50f/.lock-03494068E0d2e3fad-d79b-4b26-b561-6c0f229a91cf9c01f79a7-22e9-492d-9086-3e36a1fef50f/.lockE0d2e3fad-d79b-4b26-b561-6c0f229a91cf5c01f79a7-22e9-492d-9086-3e36a1fef50f/.lock-03494068E0d2e3fad-d79b-4b26-b561-6c0f229a91cf5c01f79a7-22e9-492d-9086-3e36a1fef50f/.lockD9b980df6-ae7a-48c8-bd14-73825677ed5a
master
GlusterFS Changelog | version: v1.2 | encoding : 2
master3
GlusterFS Changelog | version: v1.2 | encoding : 2
D9b980df6-ae7a-48c8-bd14-73825677ed5aDf545e611-22cf-4dea-bcf6-bf3a66769b16Da9232e02-f536-4ee4-92d9-96559cdf5dcaEbfc973b9-4ecc-459d-b9dc-b25f43cfc2de23331881081177e64639b-c69f-43f9-afb0-507ce1ee1ebd/record_1638912536507713_52585-in.gsmE310ba9f1-2d5b-4456-9922-4ef834e95ee723331881081177e64639b-c69f-43f9-afb0-507ce1ee1ebd/record_1638912536507713_52585-out.gsm
P.S.: Currently active node is master3 (after the manual "failover")
Shouldn't all passive nodes have the same content of the CHANGELOG file ?
Order per file will be same across the nodes. But the operations can get recorded in different changelog files.
It seems that the problem is related to the changelogs.
Here is example of the past 2 issues (timestamps from first comment):
[2021-11-30 21:04:30.114284] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 92962 [/nodirectwritedata/gluster/gvol0] (time: 1638306270.114149), (vec: 1, len: 4104)
[2021-11-30 21:04:44.129055] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 92963 [/nodirectwritedata/gluster/gvol0] (time: 1638306284.128932), (vec: 1, len: 4104)
[2021-11-30 21:04:59.144783] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 92964 [/nodirectwritedata/gluster/gvol0] (time: 1638306299.144665), (vec: 1, len: 4104)
[2021-11-30 21:05:09.190179] I [MSGID: 132035] [gf-history-changelog.c:837:gf_history_changelog] 0-gfchangelog: Requesting historical changelogs [{start=1638306276}, {end=1638306309}]
[2021-11-30 21:05:09.190320] I [MSGID: 132019] [gf-history-changelog.c:755:gf_changelog_extract_min_max] 0-gfchangelog: changelogs min max [{min=1633926166}, {max=1638306299}, {total_changelogs=195980}]
[2021-11-30 21:05:09.190562] I [MSGID: 132036] [gf-history-changelog.c:955:gf_history_changelog] 0-gfchangelog: FINAL [{from=1638306284}, {to=1638306299}, {changes=2}]
[2021-11-30 21:05:09.190890] D [MSGID: 0] [gf-history-changelog.c:297:gf_history_changelog_scan] 0-gfchangelog: hist_done 1, is_last_scan: 0
[2021-11-30 21:05:10.191050] D [MSGID: 0] [gf-history-changelog.c:297:gf_history_changelog_scan] 0-gfchangelog: hist_done 0, is_last_scan: 1
[2021-11-30 21:05:14.160553] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 92965 [/nodirectwritedata/gluster/gvol0] (time: 1638306314.160395), (vec: 1, len: 4104)
[2021-11-30 21:05:28.175328] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 92966 [/nodirectwritedata/gluster/gvol0] (time: 1638306328.175190), (vec: 1, len: 4104)
[2021-11-30 21:05:42.189783] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 92967 [/nodirectwritedata/gluster/gvol0] (time: 1638306342.189649), (vec: 1, len: 4104)
[2021-11-30 21:05:56.204524] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 92968 [/nodirectwritedata/gluster/gvol0] (time: 1638306356.204377), (vec: 1, len: 4104)
[2021-12-01 19:14:57.071507] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 95545 [/nodirectwritedata/gluster/gvol0] (time: 1638386097.71352), (vec: 1, len: 4104)
[2021-12-01 19:15:11.086155] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 95546 [/nodirectwritedata/gluster/gvol0] (time: 1638386111.86018), (vec: 1, len: 4104)
[2021-12-01 19:15:25.100922] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 95547 [/nodirectwritedata/gluster/gvol0] (time: 1638386125.100767), (vec: 1, len: 4104)
[2021-12-01 19:28:37.165759] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 95548 [/nodirectwritedata/gluster/gvol0] (time: 1638386917.165581), (vec: 1, len: 4104)
[2021-12-01 19:28:51.180591] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 95549 [/nodirectwritedata/gluster/gvol0] (time: 1638386931.180414), (vec: 1, len: 4104)
[2021-12-01 19:28:55.481728] D [MSGID: 0] [gf-changelog-api.c:55:gf_changelog_done] 0-gfchangelog: moving /var/lib/misc/gluster/gsyncd/gvol0_nvfs30_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1638386931 to processed directory
[2021-12-01 19:29:05.194963] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 95550 [/nodirectwritedata/gluster/gvol0] (time: 1638386945.194678), (vec: 1, len: 4104)
[2021-12-01 19:29:05.806899] D [MSGID: 0] [gf-changelog-api.c:55:gf_changelog_done] 0-gfchangelog: moving /var/lib/misc/gluster/gsyncd/gvol0_nvfs30_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1638386945 to processed directory
[2021-12-02 00:00:00.037913] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 95551 [/nodirectwritedata/gluster/gvol0] (time: 1638403200.37668), (vec: 1, len: 4104)
[2021-12-02 00:00:14.051495] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 95552 [/nodirectwritedata/gluster/gvol0] (time: 1638403214.51403), (vec: 1, len: 4104)
[2021-12-02 00:00:29.065960] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 95553 [/nodirectwritedata/gluster/gvol0] (time: 1638403229.65853), (vec: 1, len: 4104)
Yet , again the geo-rep switched to master2 and again the problem:
#Timestamp:
# TZ='UTC' date --date='@1639061941'
Thu Dec 9 14:59:01 UTC 2021
### Note: Timestamps are in local time (PST):
# ls -ltr changes-nodirectwritedata-gluster-gvol0.log*
-rw------- 1 root root 184304 Sep 4 06:24 changes-nodirectwritedata-gluster-gvol0.log.7.gz
-rw------- 1 root root 302160 Sep 5 00:52 changes-nodirectwritedata-gluster-gvol0.log.6.gz
-rw------- 1 root root 344924 Sep 16 06:25 changes-nodirectwritedata-gluster-gvol0.log.5.gz
-rw------- 1 root root 785825 Sep 18 06:25 changes-nodirectwritedata-gluster-gvol0.log.4.gz
-rw------- 1 root root 222834 Oct 10 20:22 changes-nodirectwritedata-gluster-gvol0.log.3.gz
-rw------- 1 root root 1264993 Dec 5 22:31 changes-nodirectwritedata-gluster-gvol0.log.2.gz
-rw------- 1 root root 0 Dec 6 06:25 changes-nodirectwritedata-gluster-gvol0.log
-rw------- 1 root root 962913 Dec 9 11:54 changes-nodirectwritedata-gluster-gvol0.log.1
# tail -n 20 changes-nodirectwritedata-gluster-gvol0.log.1
[2021-12-09 18:17:56.053980] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105672 [/nodirectwritedata/gluster/gvol0] (time: 1639073876.53847), (vec: 1, len: 4104)
[2021-12-09 18:18:10.068719] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105673 [/nodirectwritedata/gluster/gvol0] (time: 1639073890.68569), (vec: 1, len: 4104)
[2021-12-09 18:18:24.083527] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105674 [/nodirectwritedata/gluster/gvol0] (time: 1639073904.83329), (vec: 1, len: 4104)
[2021-12-09 18:24:18.208570] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105675 [/nodirectwritedata/gluster/gvol0] (time: 1639074258.208382), (vec: 1, len: 4104)
[2021-12-09 18:24:32.223213] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105676 [/nodirectwritedata/gluster/gvol0] (time: 1639074272.223109), (vec: 1, len: 4104)
[2021-12-09 18:24:46.238050] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105677 [/nodirectwritedata/gluster/gvol0] (time: 1639074286.237936), (vec: 1, len: 4104)
[2021-12-09 18:25:00.252815] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105678 [/nodirectwritedata/gluster/gvol0] (time: 1639074300.252692), (vec: 1, len: 4104)
[2021-12-09 18:25:14.267504] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105679 [/nodirectwritedata/gluster/gvol0] (time: 1639074314.267397), (vec: 1, len: 4104)
[2021-12-09 18:25:18.774405] D [MSGID: 0] [gf-changelog-api.c:55:gf_changelog_done] 0-gfchangelog: moving /var/lib/misc/gluster/gsyncd/gvol0_nvfs30_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1639074314 to processed directory
[2021-12-09 18:26:24.341382] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105680 [/nodirectwritedata/gluster/gvol0] (time: 1639074384.341183), (vec: 1, len: 4104)
[2021-12-09 18:26:38.356096] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105681 [/nodirectwritedata/gluster/gvol0] (time: 1639074398.355991), (vec: 1, len: 4104)
[2021-12-09 18:26:52.370825] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105682 [/nodirectwritedata/gluster/gvol0] (time: 1639074412.370690), (vec: 1, len: 4104)
[2021-12-09 18:27:06.385413] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105683 [/nodirectwritedata/gluster/gvol0] (time: 1639074426.385269), (vec: 1, len: 4104)
[2021-12-09 18:27:09.476805] D [MSGID: 0] [gf-changelog-api.c:55:gf_changelog_done] 0-gfchangelog: moving /var/lib/misc/gluster/gsyncd/gvol0_nvfs30_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1639074426 to processed directory
[2021-12-09 19:51:37.640918] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105684 [/nodirectwritedata/gluster/gvol0] (time: 1639079497.640742), (vec: 1, len: 4104)
[2021-12-09 19:51:51.655674] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105685 [/nodirectwritedata/gluster/gvol0] (time: 1639079511.655483), (vec: 1, len: 4104)
[2021-12-09 19:52:05.669678] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105686 [/nodirectwritedata/gluster/gvol0] (time: 1639079525.669546), (vec: 1, len: 4104)
[2021-12-09 19:54:25.055451] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105687 [/nodirectwritedata/gluster/gvol0] (time: 1639079665.55216), (vec: 1, len: 4104)
[2021-12-09 19:54:39.069114] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105688 [/nodirectwritedata/gluster/gvol0] (time: 1639079679.68953), (vec: 1, len: 4104)
[2021-12-09 19:54:53.083845] D [MSGID: 0] [gf-changelog-reborp.c:355:gf_changelog_event_handler] 0-gfchangelog: seq: 105689 [/nodirectwritedata/gluster/gvol0] (time: 1639079693.83746), (vec: 1, len: 4104)
root@master2:~# ps aux | grep [g]syncd.py
root 25678 0.0 0.0 229416 16332 ? Ssl Nov01 2:56 /usr/bin/python3 /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/nodirectwritedata/gluster/gvol0 --monitor -c /var/lib/glusterd/geo-replication/gvol0_slave3_gvol0/gsyncd.conf --iprefix=/var :gvol0 --glusterd-uuid=74927c6f-6c07-4814-86ea-c8b2070cf66b slave3::gvol0
root 48348 0.1 0.0 1823456 22500 ? Sl Dec05 6:01 python3 /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py worker gvol0 slave3::gvol0 --feedback-fd 10 --local-path /nodirectwritedata/gluster/gvol0 --local-node master2 --local-node-id 74927c6f-6c07-4814-86ea-c8b2070cf66b --slave-id 190a1403-7135-48cb-8d8e-1b1573a61904 --subvol-num 1 --resource-remote slave2 --resource-remote-id 06a32508-7a10-4ff4-bb36-3c1ba3c248d1
root@master2:~# source ~/venv/bin/activate
(venv) root@master2:~# pystack -v 25678; echo;echo;echo;pystack -v 48348
(venv) root@master2:~# pystack -v 25678; echo;echo;echo;pystack -v 48348
Standard Output:
[New LWP 25679]
[New LWP 25715]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f2b54b407c6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x7f2b44000e70) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
$1 = (void *) 0x1
Standard Error:
205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
Dumping Threads....
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 272, in wmon
slave_host, master, suuid, slavenodes)
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 245, in monitor
ret = nwait(cpid)
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 117, in nwait
p2, r = waitpid(p, o)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 492, in waitpid
return eintr_wrap(owaitpid, OSError, *args)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 480, in eintr_wrap
return func(*args)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 769, in tailer
[po.stderr for po in errstore], [], [], 1)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 488, in select
return eintr_wrap(oselect.select, oselect.error, *args)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 480, in eintr_wrap
return func(*args)
---------------
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in <module>
main()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 317, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 60, in subcmd_monitor
return monitor.monitor(local, remote)
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 360, in monitor
return Monitor().multiplex(*distribute(local, remote))
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 296, in multiplex
t.join()
File "/usr/lib/python3.6/threading.py", line 1056, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
File "<string>", line 1, in <module>
File "<string>", line 1, in <module>
Standard Output:
[New LWP 48363]
[New LWP 48365]
[New LWP 48424]
[New LWP 48425]
[New LWP 48426]
[New LWP 48427]
[New LWP 48428]
[New LWP 48429]
[New LWP 48430]
[New LWP 48431]
[New LWP 48432]
[New LWP 48433]
[New LWP 48434]
[New LWP 48435]
[New LWP 48436]
[New LWP 48437]
[New LWP 48439]
[New LWP 48440]
[New LWP 48441]
[New LWP 48442]
[New LWP 48443]
[New LWP 48444]
[New LWP 48445]
[New LWP 48470]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fc929c37e1f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7fff6b7f8360) at ../sysdeps/unix/sysv/linux/select.c:41
$1 = (void *) 0x1
Standard Error:
41 ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
Dumping Threads....
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 445, in keep_alive
time.sleep(gap)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1988, in syncjob
time.sleep(0.5)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 176, in listen
select((self.inf,), (), ())
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 488, in select
return eintr_wrap(oselect.select, oselect.error, *args)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 480, in eintr_wrap
return func(*args)
---------------
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 393, in twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 773, in tailer
time.sleep(0.5)
---------------
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in <module>
main()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 317, in main
func(args)
File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 86, in subcmd_worker
local.service_loop(remote)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1316, in service_loop
g2.crawlwrap()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 607, in crawlwrap
time.sleep(self.sleep_interval)
File "<string>", line 1, in <module>
File "<string>", line 1, in <module>
@aravindavk , @Shwetha-Acharya ,
any idea how to debug the changelog issue ?
@hunter86bg you can check rsync logs to find if there is anything going wrong with respect to rsync with the command:
gluster volume geo-replication
Also please specify the rsync version you are using.
Hi Shwetha, thanks for the hint. I will enable the logs as soon as I'm allowed to.
Rsync is the same version on both master and slave nodes:
rsync version 3.1.2 protocol version 31
Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
append, ACLs, xattrs, iconv, symtimes, prealloc
rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. See the GNU
General Public Licence for details.
Session (hostnames replaced):
Time of issues (aprox):
Slave Log:
Master log: