Closed p1r473 closed 1 year ago
Worth noting: just this week I added the following line to my config:
exclude=/var/www/apache/public/BrownNoise.mp3"
. This is a 1.3GB file. But it is excluded, and I am running a 8gb Pi. You can see in the screenshots I have nearly 6.6gb free. This is the ONLY difference with my 3 raspiBackup configs- this file exists on this Pi only.
I have REMOVED this line from the config, and just tried to raspiBackup again. The results: same issue. Darn, I was hoping that was the issue.
That's bad. I unfortunately get 403 (forbidden) when I try to access your logs.
You're the first one who reports an OOM with raspiBackup. I'm sure it's one of the Linux tools used by raspiBackup which causes the OOM.
@framps sorry, I fixed the log access. I had it set to private but was supposed to be set to unlisted. Oops!
I fixed the log access.
I just downloaded the files. You can now either delete them or protect them again. I just did a quick scan and for me there are sensitive details you most probably don't want to share with the public :wink:
I merged the two logs - syslog and raspiBackup log and that's what I see:
2:27:13 - raspiBackup starts (syslog)
2:27:24 - tar starts to backup the root partition (raspiBackup log)
2:37:37 - First OOM activities (syslog)
2:37:46 - tar terminates unsuccessfull with RC 42 (raspiBackup log)
2:37:49 - raspiBackup finished (raspiBackup log)
2:37:51 - Homeassistent and other services startup again (syslog)
As you can see it took 13 seconds from tar start to OOM start. It may be tar is the root cause ... but not sure.
RC 42 from raspiBackup should not be returned normally. If tar terminates the return code is set to the tar return code. Looks like this did not happen because raspiBackup was killed my the OOM.
Question now is why the OOM happend :flushed:
You have a system with a lot of active services, pihole, apache, chromium, nginx, ... As of now I never had somebody to backup a system with this load and you don't stop any running services. As you wrote there is enough free memory which suddenly vanishes.
There is something which may be the root cause for your issue: You use tar and zip the resulting image. You also wrote you now have a huge file for about 1.3GB and the issue popped up when you excluded the file. Maybe it popped up because you added this file?
Please execute following tests and let me know the outcome: 1) Temporarily remove this huge file 2) Don't zip the backup and just create a tar file 3) Create a dd backup on some spare disk
Regarding high load: I am running Apache, Pihole, and Homeassistant, yes, but I have 6.6gb free of RAM (see above screenshot).
I see. It's nasty. A detailed analysis of the OOM log messages may help to locate the root - but I never analyzed a OOM log. I did some investigations on this ... but it's hard stuff ...
Something is wrong for sure, the system is barely under any load during raspiBackup
Test 1: Temporarily remove this huge file
rm /var/www/apache/public/BrownNoise.mp3/
raspiBackup
Results: SUCCESS
What? Success? I will try again to be safe. But, I excluded this file?? And it is only 1.3gb??
What? Success? I will try again to be safe. But, I excluded this file?? And it is only 1.3gb??
Try again to confirm it's this file which causes the issue. Why? As of now I have no idea ...
Just tried again and it crashed again :( May be was a fluke success
Test 2: Don't zip the backup and just create a tar file Results: PENDING
Test 3: Create a dd backup on some spare disk Results: PENDING
:-( You may now either skip to step 3 to test whether it's no an issue with raspiBackup/tar or continue with step 2.
And please note: You test right now at 4 PM but raspiBackup ran last night at 2 AM ...
Only thing that runs in the morning on the Pi is RaspiBackup every night at 3AM Ill try step 3 and 4 now Thanks!
every night at 3PM
According the logs at 2AM - you may adjust your timzone :wink:
oh the one I gave you the logs were adhoc raspiBackup
runs (not cronjob), as I was trying to figure out the issue
Extract from your syslog:
Feb 28 02:27:13 Harbormaster raspiBackup: Started 0.6.8 (138a72b)
Test 2: Don't zip the backup and just create a tar file Results: CRASH
As you can see it took 13 seconds from tar start to OOM start. It may be tar is the root cause ... but not sure.
It usually lasts for 15-20 min and fails near the end. Though it literally just failed within 1 min this last attempt I just tried.
Im wondering if I should redo the logs as the ones I was doing last night was at 4am so could have errors? The cronjob is supposed to happen at 3, I did a bunch of manual attempts too
WRT timezone, I checked raspiconfig and my city is correctly set, so it was a manual raspibackup log and not the cron job
I will gather a new set of logs
I will donate heavily for your help ;) with a big bonus if you help solve the issue ;) Worst case (I HOPE) is I just keep raspiBackup restoring until I find an old backup that doesnt crash
🚧 Will make logs public tomorrow :) 🚧
New config Changes
[Attempt 1 - tar with compression]
Status: ❌
raspilog
Syslog
[Attempt 2 - tar withOUT compression] Status: ❌ raspilog Syslog
[Attempt 3 - dd with compression] Status: 🟢
[Attempt 4 - RBK0085I] Status: ❌ Syslog
mkdir Harbormaster-tgz-backup-20230228-184200
tar -cpi --one-file-system -z -f "/mnt/Harbormaster/Harbormaster/Harbormaster-tgz-backup-20230228-184200/Harbormaster-tgz-backup-20230228-184200.tgz" --warning=no-xdev --numeric-owner --exclude="/mnt/Harbormaster/*" --exclude="/tmp/raspiBackup.log" --exclude="/tmp/raspiBackup.msg" --exclude='.gvfs' --exclude=/proc/* --exclude=/lost+found/* --exclude=/sys/* --exclude=/dev/* --exclude=/tmp/* --exclude=/boot/* --exclude=/run/* --exclude=/media/* --exclude=/mnt/* --exclude=/var/lib/samba/private/msg.sock/* --exclude=/etc/pihole/* --exclude=/var/log/* --exclude=/var/lib/php/sessions/* --exclude=/var/cache/* --exclude=/var/spool/postfix/dev/log --exclude=/home/pi/.local/share/Trash/*--exclude=/root/.local/share/Trash* --exclude=/.Trash/* --exclude=/var/www/apache/public/BrownNoise.mp3 /
It usually lasts for 15-20 min and fails near the end. Though it literally just failed within 1 min this last attempt I just tried.
Ok. According the first log you provided it took just seconds and that's why I thought it's not raspiBackup. Please make sure the syslog and raspiBackup log are in the same time slot and I can merge them :wink:
Worst case (I HOPE) is I just keep raspiBackup restoring until I find an old backup that doesnt crash
This may only work if there is not any external environment stuff the root cause for this issue. But let's be optimistic. Given the fact raspiBackup succeeds with dd it sounds like there is an issue with tar.
I'm busy today and will be able to check the logs at around 8 pm my time (CET).
If you find time you may also execute a plain tar backup. Search for RBK0085I in the debug log. The debug line which includes the --exclude
statements is the command executed by raspiBackup. For example that's the rsync command executed in my environment:
--- RBK0085I: Backup of type rsync started. Please be patient.
20221204-202706 DBG 2489: --> executeRsync
20221204-202706 DBG 2492: --- rsync --exclude="/backup/*" --exclude="/tmp/raspiBackup.log" --exclude="/tmp/raspiBackup.msg"
--exclude='.gvfs' --exclude=/proc/* --exclude=/lost+found/* --exclude=/sys/* --exclude=/dev/*
--exclude=/boot/* --exclude=/tmp/* --exclude=/run/* --exclude=/media/* --exclude=/@HOSTNAME@-backup.img --exclude=/@HOSTNAME@-backup.tmg --exclude=/@HOSTNAME@-backup.sfdisk --exclude=/@HOSTNAME@-backup.mbr --exclude=/raspiBackup.log --exclude=/raspiBackup.msg --exclude /var/log/journal
--numeric-ids -aHAx --delete --info=NAME0 / "/backup/@HOSTNAME@/@HOSTNAME@-rsync-backup-20221204-202630"
Sure I will make the logs private around 8PM CET
Looks like the issue is with tar, the tar crashed, OOM. I turned on swap file (size 1024) and still crashes
sudo dphys-swapfile swapoff
nano /etc/dphys-swapfile
->CONF_SWAPSIZE=1024
sudo dphys-swapfile setup
sudo dphys-swapfile swapon
I just downloaded the logs. As you wrote even if you start tar alone you get an OOM :cry: tar is a very very old Linux backup tool and I'm surprised it causes the OOM. Maybe it's some change in your environment which causes tar to crash? But nevertheless tar shouldn't create an OOM.
I will check the logs now.
I did do lots of tinkering on the pi this week and I'm not sure what I've changed that caused this 😞 This is exactly why I love and use your tool so heavily. Cuz I can always just rollback the entire image. Hopefully something the logs stands out and if not I will start using a second SD card and trying to roll back until I find an image that doesn't crash on tar
Usually Linux is so stable...
At least it isn't a typo this time
What tool do you use to piece the logs together?
20230228-191150 DBG 2505: --> executeTar
20230228-191150 DBG 2508: --- tar -cpi --one-file-
20230228-191819 DBG 4626: --> cleanup
According the logs it takes 6:29 minutes for a tar backup to fail
20230228-184213 DBG 2505: --> executeTar
20230228-184213 DBG 2508: --- tar -cpi --one-file-system 20230228-185359 DBG 4626: --> cleanup
and 11:46 minutes for the tgz backup to fail. That's interesting because I expected the tar backup to fail later because zip will use more memory.
Size of tar file when tar aborted was
20230228-191821 DBG 4391: -rwxr-xr-x 1 root root 6617088000 Feb 28 19:11 @HOSTNAME@-tar-backup-20230228-191138.tar
and size of tgz when tar aborted was
20230228-185402 DBG 4391: -rwxr-xr-x 1 root root 3553509376 Feb 28 18:42 @HOSTNAME@-tgz-backup-20230228-184200.tgz
Root file system size is
20230228-184205 DBG 5566: /dev/root 117G 15G 97G 13% /
I found some comments about tar and OOM (see here). Not sure whether this is applicable for you. A split of tar backups will require a feature request for raspiBackup. You may try to use these Options when you use the plain tar command to check whether this helps.
15G is nothing for tar and tar should be able to save this amount of data without any issues.
I see following options for you to get rid of the issue:
1) Use rsync instead of tar I see you use cifs which causes you to use tar. Can't you use nfs which allows to use rsync? In addition rsync will create incremental backups in contrast to tar which creates a full backup all the time.
2) Execute some additional tests just with the tar command (without raspiBackup involvement) 2a) Restore on a spare disk instead of the cifs partition. Maybe cifs runs out of memory because of some kind of buffering. 2b) Reduce the number of exclude Options. It's just a guess but you use a lot of exclude statements.
3) I noticed something in the logs. Maybe you know what's causing this: 3a) Check for swap usage
Mar 1 12:24:03 Harbormaster kernel: [ 1114.758132] 0 pages in swap cache
Mar 1 12:24:03 Harbormaster kernel: [ 1114.758135] Free swap = 102396kB
Mar 1 12:24:03 Harbormaster kernel: [ 1114.758137] Total swap = 102396kB
Mar 1 12:24:03 Harbormaster kernel: [ 1114.758140] 1949696 pages RAM
No swap space is used. Did you disable swap usage?
3b) Disable RAM logging
20230228-184204 DBG 6201: /dev/mmcblk0p1 on /boot type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro)
20230228-184204 DBG 6201: /dev/mmcblk0p2 on /var/hdd.log type ext4 (rw,noatime)
20230228-184204 DBG 6201: log2ram on /var/log type tmpfs (rw,nosuid,nodev,noexec,noatime,size=5120000k,mode=755)
2
You use RAM for logging. I use a plain RaspbianOS lite on my servers and they don't do any RAM caching. Maybe tar just makes the barrel to overflow?
3) I spend time to understand the OOM system messages and hopefully find something which helps to find the root cause
4) Revert your system from existing backups until raspiBackup with tar will work again
did do lots of tinkering on the pi this week and I'm not sure what I've changed that caused this
I detected in the logs you now use a lot of additional features of raspiBackup. Great you love them. But as we see that's not the root cause of your issue :cry:
What tool do you use to piece the logs together?
Just a plain editor and copy and paste :smile: . Given the fact a plain tar command also fails for you a merge of the two logs is no longer needed. I thought maybe raspiBackup causes in a later step (backup cleanup) an OOM.
I see you use cifs which causes you to use tar. Can't you use nfs which allows to use rsync? In addition rsync will create incremental backups in contrast to tar which creates a full backup all the time.
I would much prefer this actually. Especially as you implemented my feature request here https://github.com/framps/raspiBackup/issues/366
- Execute some additional tests just with the tar command (without raspiBackup involvement) 2a) Restore on a spare disk instead of the cifs partition. Maybe cifs runs out of memory because of some kind of buffering.
Could you elaborate? Right now I am hosting my backups on my Windows PC server as it has a lot of hard drive space. Can you point me in the direction of the command?
2b) Reduce the number of exclude Options. It's just a guess but you use a lot of exclude statements.
I actually just added 4 of those excludes yesterday. My other Pis have them too with no issues.
No swap space is used. Did you disable swap usage?
I just noticed this today too. I enabled the swap and still crashes https://github.com/framps/raspiBackup/issues/620#issuecomment-1450543979
3b) Disable RAM logging
I really like log2ram, it increases the life of my SD cards. Other 2 Pis work with it very well and this one used to too!
- I spend time to understand the OOM system messages and hopefully find something which helps to find the root cause
I am happy to compensate you for your time doing this! But this is up to you as I will start on 4 - reverting. Im going to take out this SD card and I can plug it back in for testing if you find anything.
- Revert your system from existing backups until raspiBackup with tar will work again
I am going to see if I can find a backup that works. Ill restore each backup until I find one that works
The reason I am using CIFS (I think) is because my file share is on Windows. It looks like NFS is for Linux mounts, I think?
Yes. nfs is a Linux feature. I thought you maybe use some NAS and usually they run on Linux and support nfs. Maybe there is support on Windows for nfs but I don't know. I'm not familiar with windows.
I see you use cifs which causes you to use tar. Can't you use nfs which allows to use rsync? In addition rsync will create incremental backups in contrast to tar which creates a full backup all the time.
Based on this, I guess I cant use rsync with CIFS then https://www.linux-tips-and-tricks.de/images/decisiontree_en.dia.jpg
Thats too bad, I love the idea of incremental. I have been looking at synology!
@framps preliminary googling shows perhaps rsync does indeed work to a CIFS mount? https://serverfault.com/questions/877519/rsync-how-to-handle-backup-to-cifs-volume https://bobcares.com/blog/rsync-from-linux-to-windows-share/
Is the issue with hard links? Based on these articles showing rsync to CIFS, perhaps it doesnt require ext3/4 but could work on NTFS?
I would much prefer this actually. Especially as you implemented my feature request here #366
It's not implemented. It's on my backlog list. As of now it's only you who asked for this feature. I don't start to work on any feature where where the quotient #number of users/#effort is nearly zero.
Could you elaborate? Right now I am hosting my backups on my Windows PC server as it has a lot of hard drive space. Can you point me in the direction of the command?
It's just a guess. So it's up to you to decide whether you want to test this: Attach a spare disk, mount this disk and then create the backup on the disk instead of the cifs mounted partition.
I actually just added 4 of those excludes yesterday. My other Pis have them too with no issues.
As I wrote: It's just a guess. I think you can ignore this.
I just noticed this today too. I enabled the swap and still crashes #620 (comment)
Did you verify it's used in the OOM messages? Frankly I think it's the wrong approach to add additional swap space. If a system start to swap there is a memory issue.
I really like log2ram, it increases the life of my SD cards. Other 2 Pis work with it very well and this one used to too!
I see your point. Do you know you can configure an overlay filesystem with raspi-config
? Not sure whether this is an option for you.
I am happy to compensate you for your time doing this! But this is up to you as I will start on 4 - reverting. Im going to take out this SD card and I can plug it back in for testing if you find anything.
In one of my FAQ I wrote I only support any issues related with raspiBackup. As we found out you have an issue which is not related to raspiBackup but tar. OOM is an interesting topic and I will look into the OOM logs. But this will take time :cry: If it's OK with you to revert until you get a system which can be saved with raspiBackup again - that's my preference :wink: Crossing fingers you are successful. According the logs you have about 17 backups to go back ...
No problems at all. Thanks for your help!
Is the issue with hard links? Based on these articles showing rsync to CIFS, perhaps it doesnt require ext3/4 but could work on NTFS?
rsync can be used on a CIFS mounted partition. But this will be a full backup all the time because of missing support for hardlinks. A user of raspiBackup had serious trouble when he restored his system which he saved on an NTFS formatted partition. Root cause was all file attributes were silently changed to root, including any user files :cry: Therefore I added a check in raspiBackup and now any rsync backup to an NTFS partition is rejected with an error message.
In former times of raspiBackup there was code which allowed to backup with rsync even the target filesystem doesn't support hardlinks and a full backup was created. It's no longer supported by raspiBackup. tar is a full backup and rsync is an incremental backup all the time created by raspiBackup.
Thank you, very informative!
Just let me know if you succeeded to revert :wink:
Would be interesting whether you can backup your system with rsync. If you have a spare disk handy just create an ext4 partition on this disk and create a rsync backup :wink:
Thank you very much for your donation but I also would have helped you on this issue anyhow. I try to fix any issue with raspiBackup if possible. Unfortunately it looks like there is some issue with tar :cry:
Just let me know if you succeeded to revert 😉
Succeeded, had to go all the way back to 24th of last month. Thank goodness for rapsiBackup!!!!
Do you remember what you changed/updated? Would be interesting to know what's the root cause of your issue and whether it's a tar issue or something else.
I did a lot of tinkering last 2 weeks 🤓 I played around with a lot of new things and I'm not sure which broke that
According the logs your Raspberry seems to be a production system which should run 7/24.
Whenever I have major updates on one of my production Raspberries I create a raspiBackup snapshot first just to have a backup with some meaningful comment for a longer period just in case I have to revert the update. Then I clone the system with raspiBackup (restore latest backup on another SD card) and update the cloned system running on a spare Raspberry. When all regression tests were successful I shut down the production server, insert the SD card with the system I updated and start the server again.
This reduces the downtime of the server to a minimum and protects to get any issues because of the update :wink:
Hi Framps! You might remember me from #604 I have 3 Pis and they all have the exact same config for raspiBackup However I noticed that one of my healthchecks has been down every day for the last fews day, on my primary Pi. This started ever since I ran a raspi -d restoration on this Pi I think last week. As far as I can tell, this memory crashing happens only during raspiBackup. The other 2 Pis with identical config work great, and this used to work great. I am also using the same but new SD card I unpackaged in #604. Every time I run raspiBackup on this Pi, eventually I run out of memory and the Pi starts crashing and shuTting down programs
I am watching it with
glances
andwatch top
and it will be running fine and smooth as butter then all of a sudden my Pi will just start ravaging and destroying itself.[Logs + config] Syslog from right before to during the terrible crash My config raspiBackup.log
If you can help me fix this, I will be happy to donate, same as last time- this is my primary production Pi.