backuppc / rsync-bpc

Rsync-bpc is a customized version of rsync that is used as part of BackupPC
GNU General Public License v3.0
28 stars 17 forks source link

in some situations rsync stops and after some time error unexplained error (code 255) at io.c(226) #20

Open bacarrdy opened 3 years ago

bacarrdy commented 3 years ago

Hello,

We have a lot of servers what we are backing up via backuppc and now we have installed new backuppc system on centos 8 and noticed that on only few servers we have issues when backup stops and freezes and then gives error:

rsync_bpc: connection unexpectedly closed (42360 bytes received so far) [receiver] Done: 0 errors, 0 filesExist, 0 sizeExist, 0 sizeExistComp, 0 filesTotal, 0 sizeTotal, 0 filesNew, 0 sizeNew, 0 sizeNewComp, 6735 inode rsync error: error in rsync protocol data stream (code 12) at io.c(226) [receiver=3.1.3.0] [receiver] _exit_cleanup(code=12, file=io.c, line=226): about to call exit(12) rsync_bpc: connection unexpectedly closed (51093 bytes received so far) [generator] DoneGen: 0 errors, 0 filesExist, 0 sizeExist, 0 sizeExistComp, 355 filesTotal, 15231605 sizeTotal, 0 filesNew, 0 sizeNew, 0 sizeNewComp, 6734 inode rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.3.0] [generator] _exit_cleanup(code=12, file=io.c, line=226): about to call exit(255)

So i have started backup from cli with command /usr/share/BackupPC/bin/BackupPC_dump -f -vvv target_host and collected rsync_bpc at what rsync_bpc stucked:

/usr/bin/rsync_bpc -vvv --bpc-top-dir /data/BackupPC/ --bpc-host-name target_host --bpc-share-name /var/lib/mysql --bpc-bkup-num 1 --bpc-bkup-comp 3 --bpc-bkup-prevnum -1 --bpc-bkup-prevcomp -1 --bpc-bkup-inode0 6734 --bpc-log-level 1 --bpc-attrib-new -e /usr/bin/ssh\ -p\ 11111\ -q\ -x\ -l\ root --rsync-path=/usr/bin/rsync --super --recursive --protect-args --numeric-ids --perms --owner --group -D --times --links --hard-links --delete --delete-excluded --partial --log-format=log:\ %o\ %i\ %B\ %8U,%8G\ %9l\ %f%L --stats --checksum --timeout=72000 target_host:/var/lib/mysql/ /

And then started rcync process with this command and rsync process stucked at (Please keep in mind that this is not first directory for what backuppc was runing on same server, but last one and as i noticed backuppc for each directory generate rsync command. Also x@0024waits_global_by_latency.frm,374 is the last file from folder):

recv_generator(sys/x@0024wait_classes_global_by_avg_latency.frm,370) recv_generator(sys/x@0024wait_classes_global_by_latency.frm,371) recv_generator(sys/x@0024waits_by_host_by_latency.frm,372) recv_generator(sys/x@0024waits_by_user_by_latency.frm,373) recv_generator(sys/x@0024waits_global_by_latency.frm,374) generate_files phase=1 rsync_bpc: connection unexpectedly closed (42360 bytes received so far) [receiver] Done: 0 errors, 0 filesExist, 0 sizeExist, 0 sizeExistComp, 0 filesTotal, 0 sizeTotal, 0 filesNew, 0 sizeNew, 0 sizeNewComp, 6735 inode rsync error: error in rsync protocol data stream (code 12) at io.c(226) [receiver=3.1.3.0] [receiver] _exit_cleanup(code=12, file=io.c, line=226): about to call exit(12) rsync_bpc: connection unexpectedly closed (51093 bytes received so far) [generator] DoneGen: 0 errors, 0 filesExist, 0 sizeExist, 0 sizeExistComp, 355 filesTotal, 15231605 sizeTotal, 0 filesNew, 0 sizeNew, 0 sizeNewComp, 6734 inode rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.3.0] [generator] _exit_cleanup(code=12, file=io.c, line=226): about to call exit(255)

Second server what has same issue:

bash-4.4$ /usr/share/BackupPC/bin/BackupPC_dump -f -vvv target2_host Backup type: type = full, needs_full = 1, needs_incr = 1, lastFullTime = 0, opts{f} = 1, opts{i} = , opts{F} = cmdSystemOrEval: about to system /bin/echo -c 1 target2_host cmdSystemOrEval: about to system /bin/echo -c 1 target2_host CheckHostAlive: can't extract round-trip time (not fatal) CheckHostAlive: ran '/bin/echo -c 1 target2_host'; returning 0 XferLOG file /data/BackupPC//pc/target2_host/XferLOG.0.z created 2021-02-17 15:18:18 Backup prep: type = full, case = 1, inPlace = 1, doDuplicate = 0, newBkupNum = 0, newBkupIdx = 0, lastBkupNum = , lastBkupIdx = (FillCycle = 0, noFillCnt = ) bpc_progress_state backup share "/var/lib" Running: /usr/bin/rsync_bpc --bpc-top-dir /data/BackupPC/ --bpc-host-name target2host --bpc-share-name /var/lib --bpc-bkup-num 0 --bpc-bkup-comp 3 --bpc-bkup-prevnum -1 --bpc-bkup-prevcomp -1 --bpc-bkup-inode0 2 --bpc-log-level 1 --bpc-attrib-new -e /usr/bin/ssh\ -p\ 11111\ -q\ -x\ -l\ root --rsync-path=/usr/bin/rsync --super --recursive --protect-args --numeric-ids --perms --owner --group -D --times --links --hard-links --delete --delete-excluded --partial --log-format=log:\ %o\ %i\ %B\ %8U,%8G\ %9l\ %f%L --stats --checksum --timeout=72000 --exclude=/proc --exclude=/tmp --exclude=/var/spool/exim --exclude=/var/spool/pmta --exclude=/var/spool/postfix --exclude=*.log --exclude=*.log.* --exclude=*.pid --exclude=sess* --exclude=*.avi --exclude=*.deb --exclude=*.dmp target2_host:/var/lib/ / full backup started for directory /var/lib started full dump, share=/var/lib Xfer PIDs are now 20910 xferPids 20910 This is the rsync child about to exec /usr/bin/rsync_bpc cmdExecOrEval: about to exec /usr/bin/rsyncbpc --bpc-top-dir /data/BackupPC/ --bpc-host-name target2+host --bpc-share-name /var/lib --bpc-bkup-num 0 --bpc-bkup-comp 3 --bpc-bkup-prevnum -1 --bpc-bkup-prevcomp -1 --bpc-bkup-inode0 2 --bpc-log-level 1 --bpc-attrib-new -e /usr/bin/ssh\ -p\ 11111\ -q\ -x\ -l\ root --rsync-path=/usr/bin/rsync --super --recursive --protect-args --numeric-ids --perms --owner --group -D --times --links --hard-links --delete --delete-excluded --partial --log-format=log:\ %o\ %i\ %B\ %8U,%8G\ %9l\ %f%L --stats --checksum --timeout=72000 --exclude=/proc --exclude=/tmp --exclude=/var/spool/exim --exclude=/var/spool/pmta --exclude=/var/spool/postfix --exclude=*.log --exclude=*.log.* --exclude=*.pid --exclude=sess* --exclude=*.avi --exclude=*.deb --exclude=*.dmp target2_host:/var/lib/ / Xfer PIDs are now 20910,20913 xferPids 20910,20913 xferPids 20910,20913 new recv cd+++++++++ rwx------ 0, 0 4096 . new recv cd+++++++++ rwxr-xr-x 0, 0 4096 1052039 bpc_progress_fileCnt 1 bpc_progress_fileCnt 21 bpc_progress_fileCnt 41 bpc_progress_fileCnt 61 bpc_progress_fileCnt 81 bpc_progress_fileCnt 101 bpc_progress_fileCnt 121 bpc_progress_fileCnt 141 same recv >f..tp..... rw-r--r-- 0, 0 0 1052039/fastboot rsync_bpc: connection unexpectedly closed (38533 bytes received so far) [receiver] Done: 0 errors, 0 filesExist, 0 sizeExist, 0 sizeExistComp, 0 filesTotal, 0 sizeTotal, 0 filesNew, 0 sizeNew, 0 sizeNewComp, 7 inode rsync error: error in rsync protocol data stream (code 12) at io.c(226) [receiver=3.1.3.0] rsync_bpc: connection unexpectedly closed (54016 bytes received so far) [generator] DoneGen: 0 errors, 159 filesExist, 3935 sizeExist, 4941 sizeExistComp, 160 filesTotal, 3935 sizeTotal, 0 filesNew, 0 sizeNew, 0 sizeNewComp, 2146 inode rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.3.0] rsync_bpc exited with fatal status 255 (65280) (rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.3.0]) Xfer PIDs are now xferPids Got fatal error during xfer (rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.3.0]) cmdSystemOrEval: about to system /bin/echo -c 1 target2_host cmdSystemOrEval: about to system /bin/echo -c 1 target2_host CheckHostAlive: can't extract round-trip time (not fatal) CheckHostAlive: ran '/bin/echo -c 1 target2_host'; returning 0 Backup aborted (rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.3.0]) bpc_progress_state fail cleanup BackupFailCleanup: nFilesTotal = 160, type = full, BackupCase = 1, inPlace = 1, lastBkupNum = Keeping non-empty backup #0 (/data/BackupPC//pc/target2_host/0) bpc_progress_state fsck Running BackupPC_refCountUpdate -h target2_host -f on target2_host cmdSystemOrEval: about to system /usr/share/BackupPC/bin/BackupPC_refCountUpdate -h target2_host -f Xfer PIDs are now 20922 xferPids 20922 bpc_progress_state refCnt #0 BackupPC_refCountUpdate: target2_host #0 inodeLast set to 2132 (was 1) bpc_progress_state cntUpdate #0 bpc_progress_state rename #0 bpc_progress_state sumUpdate bpc_progress_fileCnt 0/128 bpc_progress_fileCnt 8/128 bpc_progress_fileCnt 16/128 bpc_progress_fileCnt 24/128 bpc_progress_fileCnt 32/128 bpc_progress_fileCnt 40/128 bpc_progress_fileCnt 48/128 bpc_progress_fileCnt 56/128 bpc_progress_fileCnt 64/128 bpc_progress_fileCnt 72/128 bpc_progress_fileCnt 80/128 bpc_progress_fileCnt 88/128 bpc_progress_fileCnt 96/128 bpc_progress_fileCnt 104/128 bpc_progress_fileCnt 112/128 bpc_progress_fileCnt 120/128 bpc_progress_state rename total BackupPC_refCountUpdate: host target2_host got 0 errors (took 0 secs) Xfer PIDs are now xferPids Finished BackupPC_refCountUpdate (running time: 0 sec) Xfer PIDs are now xferPids dump failed: rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.3.0]

if need to collect any additional data please tell me

bacarrdy commented 3 years ago

[root@backuppc ~]# yum list installed rsync-bpc Installed Packages rsync-bpc.x86_64 3.1.3.0-1.el8 @epel [root@backuppc ~]# yum list installed rsync Installed Packages rsync.x86_64 3.1.3-12.el8 @anaconda [root@backuppc ~]#

craigbarratt commented 3 years ago

Thanks for providing the output from BackupPC_dump -f -v target2_host. It fails quite quickly.

BackupPC_dump only takes a single -v option, so repeating it don't increase the log level. Please re-run manually with two changes (but still keep -v on BackupPC_dump):

By the way, the default $Conf{RsyncSshArgs} has added -o ServerAliveInterval=60 so you might try adding that too. The git default is now:

$Conf{RsyncSshArgs} = ['-e', '$sshPath -l root -o ServerAliveInterval=60'];
bacarrdy commented 3 years ago

Hello,

Sorry for delay. I have deleted all backups of backup target (it starts backups but not finish) and then started again with required by you opinions. As i can see it looks that it tryes to make backup of one of folders, but doesnt start second.

Log attached log.txt

if you will need log from another target then just tell me.

nicolasgoudard commented 3 years ago

HI ! UP. I have the same problem, I put the same parameters (ServerAliveInterval=60 and ServerAliveCountMax=0)

Backup aborted (rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.3.0])

I don't know how to do.

nicolasgoudard commented 3 years ago

UP again . I found the reason . I put a bad character in the .ssh/config file on the backuppc server . But I don't understand why the server needs the file, indeed I put "-o " options in the ssh command in the backuppc configuration , so I don't understand why it would need this file ?

TheNetworkIsDown commented 2 months ago

We're facing the same symptoms here. I am now trying to implement verbose output from https://github.com/backuppc/rsync-bpc/issues/20#issuecomment-781981358 Too bad that each log line does not necessarily include an exact timestamp what happened when, otherwise it would be easier troubleshooting on network side etc as well. BTW we're also facing https://github.com/backuppc/rsync-bpc/issues/28 regularly. All of it happening on a host with tens of millions of files.

TheNetworkIsDown commented 1 month ago

The SSH client (on the backuppc server) is just disappearing as it seems. And then it looks like this:

42de9328cf40862dec68537787c95022 1
1403fad62d5cef9cf2a877be379c31ac 1
5933c24cba1cadc8b80353e47e2665b5 1
DoneGen: 0 errors, 0 filesExist, 0 sizeExist, 0 sizeExistComp, 22482011 filesTotal, 122747042777 sizeTotal, 0 filesNew, 0 sizeNew, 0 sizeNewComp, 120668498 inode
Parsing done: nFilesTotal = 22482011
rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.3.0]
[generator] _exit_cleanup(code=12, file=io.c, line=226): about to call exit(255)
rsync_bpc exited with fatal status 255 (65280) (rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.3.0])
Xfer PIDs are now
Executing DumpPostUserCmd:..

SSH server (on the client) is reporting the following more or less at the same time

2024-08-10T01:21:30.125447+02:00 host sshd[23918]: Read error from remote host 1.2.3.118: Connection timed out
2024-08-10T01:21:30.125710+02:00 host sshd[23918]: debug1: do_cleanup

RsyncClientPath: nice -n 19 ionice -c2 -n7 sudo /usr/bin/rsync RsyncSshArgs: $sshPath -l backuppc -o ServerAliveInterval=60 -v

I'm trying to increase this to "-vvv" then...