Closed kruthers64 closed 2 years ago
From the changelog: #5659 - this might be useful for you debugging this:
create: verbose files cache logging via --debug-topic=files_cache, #5659
Note: this should be added to docs / faq.
Note: Relocating a repo, having / deleting checkpoint archives, server being 32bit is likely not relevant to this issue.
Took your advice and it revealed something:
Creating archive at "ssh://username@1.2.3.4:2222/backups/borg/repo::repo-2022-04-04T12:59:57"
FILES-CACHE-LOAD: starting...
FILES-CACHE-LOAD: finished, 2300633 entries loaded.
UNKNOWN: no file metadata in cache for: b'/full/path/to/av/first/file'
A av/first/file
FILES-CACHE-UPDATE: put FileCacheEntry(age=0, inode=108530798, size=37, cmtime=1643145529230964115, chunk_ids='[1 entries]') [has mtime] <- b'/full/path/to/av/first/file'
UNKNOWN: no file metadata in cache for: b'/full/path/to/av/second/file'
A av/second/file
FILES-CACHE-UPDATE: put FileCacheEntry(age=0, inode=100271633, size=34742272, cmtime=1442974580000000000, chunk_ids='[17 entries]') [has mtime] <- b'/full/path/to/av/second/file'
etc.
If you notice borg said '2300633 entries loaded' so I guess that means about 2.3 million cached file paths? I counted my backup data and it's about 8.5 million files, so that would mean most file paths will not be in the cache.
Is that accurate or am I misinterpreting the 'entries loaded' number?
That number is the total amount of entries in the files cache. Did you maybe back up some other data sets, so that your big data set's entries went beyond the cache TTL (which is adjustable, btw)?
Also, stuff is still at the same FULL path as in the previous backups?
No, this is the only borg repo I have, and full paths have not changed at all.
So I thought the cache TTL shouldn't apply in this case because it counts number of accesses, correct? This is a new repo and I've only run borg create 5 or so times total.
One thing I just thought of - I did have an old borg 1.1.4 repo, but I nuked it and started over with 1.2.0. However I never removed the old caches. I just checked and it looks like about 800 MB of cache from the old repo is still left over on the client side. Could this have confused borg? Maybe I'm grasping at straws now...
It still wouldn't make sense because the cache was working perfectly before I relocated the server.
A repo is identified by its ID. The config and cache dirs have the repo ID in their path.
Maybe read the FAQ entries I linked above. They explain all cache stuff quite well.
Ok so there should be no interference from old caches, good to know. Anyway, after some more testing I've concluded that I just have a corrupted borg cache. Here's what I tried:
FILES-CACHE-LOAD: starting...
FILES-CACHE-LOAD: finished, 163130 entries loaded.
FILES-CACHE-SAVE: starting...
FILES-CACHE-KILL: removed all old entries with age >= TTL [20]
FILES-CACHE-KILL: removed all current entries with newest cmtime 9223372036854775807
FILES-CACHE-SAVE: finished, 163130 remaining entries saved.
So I don't know what I did to wreck the old cache but I'm going to try letting the whole backup run with a clean cache now and see if it continues to work. Might take a few days...
Borg does strong checksums when writing the cache and checks them when loading it, so a corrupted cache file would not get loaded and it would rebuild the cache completely.
Before we do further debugging, you need to make sure though that the stuff described in the docs / faq does not explain what you see.
I was about to close this as "solved" because clearing the cache fixed it, but sure, I'm happy to help debug.
And yes, I did read those FAQ entries many times over trying to understand this problem. I have:
Since starting with a clear cache I can backup the entire 1.5 TB and see only a few 'M' entries for modified files on the 2nd run. So it really seems like something in my old cache was causing a problem. Unfortunately I deleted that cache...
I'm happy to send you real command and log output if there's some way without posting it in public.
OK, guess we can't find the root cause for this any more. Glad it was fixable by clearing the cache!
Sorry to reopen a closed issue but the problem happened again. No need to respond for now, I'm just logging this so I don't forget the details. I'm going to keep watching borg with debugging enabled and hopefully come back later with more info. Anyway here's what happened:
(FYI I'm running borg 1.2.0 on both ends)
--debug-topic=files_cache
, started it again, saw the following:
FILES-CACHE-LOAD: starting...
FILES-CACHE-LOAD: finished, 0 entries loaded.
UNKNOWN: no file metadata in cache for ...
files
file was zero length:
# ll /root/.cache/borg/74c608175d26af3fc5e748d4477894abaa1739e92aef460611f13707a246ec43
total 184416
-rw------- 1 root root 188818054 Apr 10 16:25 chunks
drwx------ 2 root root 4096 Apr 4 21:40 chunks.archive.d
-rw------- 1 root root 611 Apr 10 16:25 config
-rw------- 1 root root 0 Apr 10 16:25 files
-rw------- 1 root root 2 Apr 10 16:25 pre12-meta
-rw------- 1 root root 68 Apr 4 21:29 README
drwx------ 2 root root 4096 Apr 10 16:25 txn.active
total 779068
-rw------- 1 root root 188818054 Apr 11 00:39 chunks
drwx------ 2 root root 4096 Apr 4 21:40 chunks.archive.d
-rw------- 1 root root 611 Apr 11 00:39 config
-rw------- 1 root root 608921901 Apr 11 00:39 files
-rw------- 1 root root 2 Apr 11 00:39 pre12-meta
-rw------- 1 root root 68 Apr 4 21:29 README
No idea what is zeroing out the cache file but I should get debugging info from it next time it happens.
Maybe have your backup script do ls -l
on the cache dir before each backup starts?
Well at this point I'm only running it by hand, and I'm doing watch ls -l
on the cache dir in another window...
Just an update - I have not been able to recreate this problem, and I've been trying. The caching has been working exactly as expected for a while.
One thing I tried several times was killing borg create
(with CTRL-C) right at the end of the process when the files cache is being re-written. But no, borg handles it really well and finishes writing the file before acknowledging the signal and exiting.
Anyway, I probably won't update this issue again unless I catch the problem in a log file.
I am just facing this issue again, when 5TB of data was copied from drive to repo, then when I try to copy the very same, matching metadata and everything fileset from a remote machine, it starts from zero and tries to add every single file.
I understand this might be by design, but what is causing this? Why does it not understand that those files are exactly the same, they were copied with rsync over time.
I certainly don't want to copy 5TB over the network because of this, they literally already exist on the target machine. What are my options here?
@ThatCoffeeGuy check the borg FAQ and the docs about the files cache. Usually it is either a change src path, a deleted files cache or touched file metadata.
@ThatCoffeeGuy check the borg FAQ and the docs about the files cache. Usually it is either a change src path, a deleted files cache or touched file metadata.
I appreciate your input on this.
@ThomasWaldmann I started with the FAQ but I believe things are getting a little more complicated here. I tried copying the .cache, with no success. Then I deleted it, no success. Is there any supported way of NOT forcing me to copy terabytes of already existing data over the internet again? Touched file or metadata should really not be the case at all.
borg NEVER copies "terabytes of already existing data to a repository".
if you cause files-cache misses due to the mentioned changes, it will re-read and re-hash the source files on the client (and this is the topic of this ticket, see top post), but it won't write them to the repository because the hashes of the existing data in the repo are known locally.
but anyway, it is bad to discuss this in a years old "could not reproduce ticket", so if you can reproduce with a current borg version and you checked it is non of the mentioned things, then please file a new issue and give all the required data so I can investigate.
Note: I fixed the issue topic to more precisely describe the issue in the top post.
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
bug (I think, or maybe I did too many weird things at once, or hopefully you can point out a mistake)
System information. For client/server mode post info for both machines.
client: Linux clientname 5.4.0-105-generic #119-Ubuntu SMP Mon Mar 7 18:49:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
server: Linux servername 5.10.0-13-686 #1 SMP Debian 5.10.106-1 (2022-03-17) i686 GNU/Linux
Your borg version (borg -V).
borg 1.2.0 (on both client and server)
Operating system (distribution) and version.
client: ubuntu 20.04 server: debian 11 bullseye
Hardware / network configuration, and filesystems used.
filesystems are ext4, client is a 64 bit machine but server is 32 bit! (but this appears to work fine)
How much data is handled by borg?
1.5 TB
Full borg commandline that lead to the problem (leave away excludes and passwords)
Describe the problem you're observing.
First let me explain the "relocated repo": I use this borg server for offsite backups but I started by creating the first archive with this server at home on my local network. Then I moved the server to a remote location, which caused the ssh command to change in the BORG_REPO env var which borg interprets as a relocated repo. This is why I set BORG_RELOCATED_REPO_ACCESS_IS_OK=yes
I ran borg create more than once while I had the server on the local network and after the first time only a few files showed up as added or modified in the logs, as expected. So everything appeared to work fine when I had the server local.
But now that the server is moved to its remote location borg is attempting to add all files again. Note that it is not claiming the files are modified, the logs show A for add; also note that borg gave the message about relocated repo:
I immediately killed borg w/ CTRL-C because the whole point was to avoid copying everything over the internet.
Also, I'm pretty sure I did this same process several years ago with borg 1.1.4 and had no such problem. Note that I'm using a completely from-scratch, newly created borg 1.2.0 repo here, not an upgrade from 1.1.4.
A note about the
--files-cache mtime,size,inode
arg: I'm using borg to backup a backup. The source data always has a new ctime so if I don't remove ctime from the check borg will think every file is modified. This would show up in the logs as:Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
Yeah. If I run the same command again, borg will try again to add all files from the beginning.
One more complication: I had a checkpoint archive in the borg repo when I started the first "remote" backup. This was because I actually tried a third borg backup locally, but killed it before it finished. (Maybe the checkpoint plus the relocated repo is what caused this problem?)
After the problem I deleted the checkpoint and tried again:
It made no difference, it started trying to add all files again.