framps / raspiBackup

Create and keep multiple backup versions of your running Raspberries
https://raspibackup.linux-tips-and-tricks.de
GNU General Public License v3.0
810 stars 74 forks source link

raspiBackup tar backup operation hangs in ls on Bookworm using cifs #760

Closed madbrain76 closed 3 months ago

madbrain76 commented 5 months ago

I did a new install of Bookworm on my Raspberry Pi 4B. I am backing up onto a ZFS file system on my Ubuntu 22.04 NAS, shared via Samba, and connected via cifs on the Pi side. The file system is mounted in fstab like this šŸ‘

//higgs.local/Network\040Backups /higgs_backup cifs _netdev,credentials=/etc/creds,uid=madbrain,gid=madbrain,x-systemd.automount 0 0

The problem I see is that the backup operation hangs at the end with this message : --- RBK0033I: Please wait until cleanup has finished.

I have waited over 10 hours, and it didn't change.

I reviewed the process list with pstree and found that raspiBackup had an ls child process. ps -ef showed the command was "ls -la /backup/pi64/pi64-tar-backup-20240525-173558". I tried to run that command in a terminal, and it hung as well. However, I could list older backups just fine in /backup/pi64 . Something is happening with the specific directory. I tried to use strace -p to see where ls was stuck, but it hung also. The ls process cannot be killed even with kill -9. I logged on to the NAS and could list the backup directory just fine. I am unable to unmount /backup even with umount -f or umount -l . It just says the source is busy. If I try to reboot Phe pi after the backup hangs, it just hangs and doesn't actually reboot. I have to unplug the power supply phyiscally for it to come back up.

This is most likely some kind of cifs model kernel bug, but I have so far only seen it with raspiBackup, so I'm filing the issue here to start.

I will be attaching log files shortly - just need to run one more backup.

madbrain76 commented 4 months ago

I thought the kernel upgrade had fixed it at first, but the problem came back a few days later. There is still no fix.

MASTERDR1VE commented 4 months ago

ok, thx... I will go with the "no ls version" next... Unfortunately, another reboot is required sudo raspiBackup -U -S ??? RBK0015E: There is already an instance of raspiBackup up and running ... which I can't kill. Will report after udating.

madbrain76 commented 4 months ago

Even with the no-ls version, you may subsequently get a hang if you try to access the backup directory. The only workaround I found was to unmount the directory, and then mount it again.

framps commented 4 months ago

Does this help with further investigating the issue?

@madbrain76 and I already tried to nail down the root cause of the issue but unfortunately have no success. If we find the root cause and get a fix you are a candidate to verify the fix :smile:

what would be the most reliable "workaround"?

This depends on your environment you have: I suggest to use rsync. If you use a remote backup space this means you have to use nfs instead of smb. If you have to use tar use a local attached backup device. If you have to use smb use dd which ist the worst workaround.

MASTERDR1VE commented 4 months ago

thanks... just tested after a reboot, but unfortunately "systemd" logged something, so I got a tar: /var/log/journal/9f2f82bb40ff47c4a89d91186da11d61/system.journal: file changed as we read it ??? RBK0021E: Backupprogram for type tar failed with RC 1.

and then --- RBK0043I: Removing incomplete backup in /mnt/PI_Backup/PI4/PI4-tar-backup-20240619-221615. This may take some time. Please be patient.

which is displayed for ~20 minutes now. Checking the processes, I see the "rm" hanging

root        6888  0.0  0.0  10468  3456 pts/0    S+   22:16   0:00  |           \_ sudo raspiBackup
root        6889  0.0  0.0  10468  1428 pts/1    Ss   22:16   0:00  |               \_ sudo raspiBackup
root        6890  0.0  0.1  11272  7552 pts/1    S+   22:16   0:00  |                   \_ /bin/bash /usr/local/bin/raspiBackup
root        6973  0.0  0.0   5344  1536 pts/1    S+   22:16   0:00  |                       \_ tee -ia /tmp/raspiBackup.log
root        6974  0.0  0.0   5344  1536 pts/1    S+   22:16   0:00  |                       \_ tee -ia /tmp/raspiBackup.log
root       16770  0.1  0.0   6260  1280 pts/1    D+   22:27   0:01  |                       \_ rm -rfd /mnt/PI_Backup/PI4/PI4-tar-backup-20240619-221615

since I need tu use network-backup, I will try switching to rsync+nfs

madbrain76 commented 4 months ago

Yes, I saw that rn hang as well when the backup fails. It wasn't because of systemd but another file. I mentioned it in an earlier comment. There is no workaround for that one .

framps commented 4 months ago

Checking the processes, I see the "rm" hanging

ls was used by raspiBackup for debugging purposes. Because of this issue I removed this debug statement but later on when the backup cleanup strategy starts rm hangs also :cry: Unfortunately.

github-actions[bot] commented 4 months ago

This issue is considered stale now and will be closed in 1 week if there is no activity any more

framps commented 4 months ago

@madbrain76 Do you still have any idea how to make progress on this issue? I unfortunately not :cry:

madbrain76 commented 4 months ago

Maybe file an issue here: https://github.com/raspberrypi/linux

I'm not too optimistic given lack of response in the forums, and the fact they have 839 open issues.

framps commented 4 months ago

What about you now? Do you have any workaround or are you blocked now?

madbrain76 commented 4 months ago

I do have a workaround when the backup is successful - unmount the share, and mount it again.

For the failed case, no workaround. It fails each time I forget to close my browser. I have a Gmail POP3 extension in Firefox that causes some files to change in the middle of the backup. I probably should figure out which files those are and how to exclude then.

framps commented 4 months ago

I do have a workaround when the backup is successful

Try the dynamic mount feature. As far as I understand then there is no manual umount/mount required any more.

I probably should figure out which files those are and how to exclude then.

I had a lengthy discussion how to detect the file(s) which changed during backup. Then use the --exclude option to exclude your POP3 directory where files are changed.

madbrain76 commented 4 months ago

Thanks. It's an extension that downloads POP3 messages into gmail. I'm not sure if there is a separate folder, or what it might be. It refreshes every 3 minutes.

framps commented 4 months ago

or what it might be.

Just use inotifywait and you will know which files were updated,added or deleted :wink:

github-actions[bot] commented 4 months ago

This issue is considered stale now and will be closed in 1 week if there is no activity any more

dpmeixner commented 2 months ago

I actually found a workaround, mounting the share differently :

//server10g.local/zfs/Backups /backup cifs _netdev,credentials=/etc/creds,uid=madbrain,gid=madbrain,x-systemd.automount 0 0

//server10g.localdomain/zfs/Backups /backup cifs credentials=/etc/creds,vers=3.0,sec=ntlmssp,uid=madbrain,gid=madbrain,iocharset=utf8,soft,noserverino,cache=none,actimeo=30 0 0

The first way causes the hang. This is the syntax I was using with Bullseye.

The second way doesn't hang. However, the backup takes 7 times as long ! 20 minutes vs 3, for about 9GB. If the Pi4 was purely IO-bound, it should take only about 1m16s on Gigabit Ethernet. I also have a 2.5 Gbps USB NIC I'm not using because it's not stable enough. NAS is 10 Gbps with over 100TB.

I know this issue is old, but I just came across it and wanted to say thanks for providing the workaround. Adding the "cache=none" option fixed it for me. In my case, the slowdown isn't so dramatic. Tar backups before took ~28 minutes and now take ~33 minutes.

framps commented 2 months ago

I know this issue is old,

Yes, but I appreciate your comment and you shared your fix for the issue :+1: