borg re-reads / re-hashes everything after relocating repo

kruthers64 commented 2 years ago

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

bug (I think, or maybe I did too many weird things at once, or hopefully you can point out a mistake)

System information. For client/server mode post info for both machines.

client: Linux clientname 5.4.0-105-generic #119-Ubuntu SMP Mon Mar 7 18:49:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

server: Linux servername 5.10.0-13-686 #1 SMP Debian 5.10.106-1 (2022-03-17) i686 GNU/Linux

Your borg version (borg -V).

borg 1.2.0 (on both client and server)

Operating system (distribution) and version.

client: ubuntu 20.04 server: debian 11 bullseye

Hardware / network configuration, and filesystems used.

filesystems are ext4, client is a 64 bit machine but server is 32 bit! (but this appears to work fine)

How much data is handled by borg?

1.5 TB

Full borg commandline that lead to the problem (leave away excludes and passwords)

export BORG_RELOCATED_REPO_ACCESS_IS_OK=yes
export BORG_REPO=ssh://username@1.2.3.4:2222/backups/borg/repo
export BORG_RSH=ssh -i /path/to/key -o StrictHostKeyChecking=no -o BatchMode=yes

borg --remote-path /path/to/remote/borg \
create \
    --verbose \
    --show-rc \
    --files-cache mtime,size,inode \
    --filter AMCE \
    --list \
    --stats \
    --compression lz4 \
    --exclude-caches \
::'{hostname}-{now}' \
av data

Describe the problem you're observing.

First let me explain the "relocated repo": I use this borg server for offsite backups but I started by creating the first archive with this server at home on my local network. Then I moved the server to a remote location, which caused the ssh command to change in the BORG_REPO env var which borg interprets as a relocated repo. This is why I set BORG_RELOCATED_REPO_ACCESS_IS_OK=yes

I ran borg create more than once while I had the server on the local network and after the first time only a few files showed up as added or modified in the logs, as expected. So everything appeared to work fine when I had the server local.

But now that the server is moved to its remote location borg is attempting to add all files again. Note that it is not claiming the files are modified, the logs show A for add; also note that borg gave the message about relocated repo:

Creating archive at "ssh://username@1.2.3.4:2222/backups/borg/repo::repo-2022-04-04T08:25:33"
Warning: The repository at location ssh://username@1.2.3.4:2222/backups/borg/repo was previously located at ssh://username@localname:22/backups/borg/repo
Do you want to continue? [yN] yes (from BORG_RELOCATED_REPO_ACCESS_IS_OK)
A av/first/file
A av/second/file
etc.

I immediately killed borg w/ CTRL-C because the whole point was to avoid copying everything over the internet.

Also, I'm pretty sure I did this same process several years ago with borg 1.1.4 and had no such problem. Note that I'm using a completely from-scratch, newly created borg 1.2.0 repo here, not an upgrade from 1.1.4.

A note about the --files-cache mtime,size,inode arg: I'm using borg to backup a backup. The source data always has a new ctime so if I don't remove ctime from the check borg will think every file is modified. This would show up in the logs as:

M av/first/file
M av/second/file
etc.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yeah. If I run the same command again, borg will try again to add all files from the beginning.

One more complication: I had a checkpoint archive in the borg repo when I started the first "remote" backup. This was because I actually tried a third borg backup locally, but killed it before it finished. (Maybe the checkpoint plus the relocated repo is what caused this problem?)

After the problem I deleted the checkpoint and tried again:

(on the borg server)
# borg delete repo::repo-2022-04-02T11:06:06.checkpoint

(from the client)
Creating archive at "ssh://username@1.2.3.4:2222/backups/borg/repo::repo-2022-04-04T08:44:10"
Synchronizing chunks cache...
Archives: 2, w/ cached Idx: 0, w/ outdated Idx: 0, w/o cached Idx: 2.
Fetching and building archive index for repo-2022-03-31T13:01:12 ...
Merging into master chunks index ...
Fetching and building archive index for repo-2022-04-02T08:26:20 ...
Merging into master chunks index ...
Done.

It made no difference, it started trying to add all files again.

ThomasWaldmann commented 2 years ago

https://borgbackup.readthedocs.io/en/1.2-maint/faq.html#i-am-seeing-a-added-status-for-an-unchanged-file

https://borgbackup.readthedocs.io/en/1.2-maint/faq.html#it-always-chunks-all-my-files-even-unchanged-ones

ThomasWaldmann commented 2 years ago

From the changelog: #5659 - this might be useful for you debugging this:

create: verbose files cache logging via --debug-topic=files_cache, #5659

Note: this should be added to docs / faq.

ThomasWaldmann commented 2 years ago

Note: Relocating a repo, having / deleting checkpoint archives, server being 32bit is likely not relevant to this issue.

kruthers64 commented 2 years ago

Took your advice and it revealed something:

Creating archive at "ssh://username@1.2.3.4:2222/backups/borg/repo::repo-2022-04-04T12:59:57"
FILES-CACHE-LOAD: starting...
FILES-CACHE-LOAD: finished, 2300633 entries loaded.
UNKNOWN: no file metadata in cache for: b'/full/path/to/av/first/file'
A av/first/file
FILES-CACHE-UPDATE: put FileCacheEntry(age=0, inode=108530798, size=37, cmtime=1643145529230964115, chunk_ids='[1 entries]') [has mtime] <- b'/full/path/to/av/first/file'
UNKNOWN: no file metadata in cache for: b'/full/path/to/av/second/file'
A av/second/file
FILES-CACHE-UPDATE: put FileCacheEntry(age=0, inode=100271633, size=34742272, cmtime=1442974580000000000, chunk_ids='[17 entries]') [has mtime] <- b'/full/path/to/av/second/file'
etc.

If you notice borg said '2300633 entries loaded' so I guess that means about 2.3 million cached file paths? I counted my backup data and it's about 8.5 million files, so that would mean most file paths will not be in the cache.

Is that accurate or am I misinterpreting the 'entries loaded' number?

ThomasWaldmann commented 2 years ago

That number is the total amount of entries in the files cache. Did you maybe back up some other data sets, so that your big data set's entries went beyond the cache TTL (which is adjustable, btw)?

Also, stuff is still at the same FULL path as in the previous backups?

kruthers64 commented 2 years ago

No, this is the only borg repo I have, and full paths have not changed at all.

So I thought the cache TTL shouldn't apply in this case because it counts number of accesses, correct? This is a new repo and I've only run borg create 5 or so times total.

One thing I just thought of - I did have an old borg 1.1.4 repo, but I nuked it and started over with 1.2.0. However I never removed the old caches. I just checked and it looks like about 800 MB of cache from the old repo is still left over on the client side. Could this have confused borg? Maybe I'm grasping at straws now...

It still wouldn't make sense because the cache was working perfectly before I relocated the server.

ThomasWaldmann commented 2 years ago

A repo is identified by its ID. The config and cache dirs have the repo ID in their path.

ThomasWaldmann commented 2 years ago

Maybe read the FAQ entries I linked above. They explain all cache stuff quite well.

kruthers64 commented 2 years ago

Ok so there should be no interference from old caches, good to know. Anyway, after some more testing I've concluded that I just have a corrupted borg cache. Here's what I tried:

I backed up a subdir of the data containing about 5 GB, 163130 files. As expected it had to add every file to the cache.
Backed it up again, same subdir, a few times. Each time several hundred files were not found in the cache. (Note: nothing in this subdir has changed, it's all old files.)
Tried again with a fresh cache by setting BORG_CACHE_DIR to some other location; first pass of course took a while.

Backed up same path several times, this time it all worked as expected, NO files added, everything handled by the cache. This is what I see now:

FILES-CACHE-LOAD: starting...
FILES-CACHE-LOAD: finished, 163130 entries loaded.
FILES-CACHE-SAVE: starting...
FILES-CACHE-KILL: removed all old entries with age >= TTL [20]
FILES-CACHE-KILL: removed all current entries with newest cmtime 9223372036854775807
FILES-CACHE-SAVE: finished, 163130 remaining entries saved.

So I don't know what I did to wreck the old cache but I'm going to try letting the whole backup run with a clean cache now and see if it continues to work. Might take a few days...

ThomasWaldmann commented 2 years ago

Borg does strong checksums when writing the cache and checks them when loading it, so a corrupted cache file would not get loaded and it would rebuild the cache completely.

and 2. sound like a bug, i need the full commands, borg debug output and borg version for both.

Before we do further debugging, you need to make sure though that the stuff described in the docs / faq does not explain what you see.

kruthers64 commented 2 years ago

I was about to close this as "solved" because clearing the cache fixed it, but sure, I'm happy to help debug.

And yes, I did read those FAQ entries many times over trying to understand this problem. I have:

stable mounts
stable inodes
older timestamps
a new repo with only one dataset, so no sensible interaction with BORG_FILES_CACHE_TTL

Since starting with a clear cache I can backup the entire 1.5 TB and see only a few 'M' entries for modified files on the 2nd run. So it really seems like something in my old cache was causing a problem. Unfortunately I deleted that cache...

I'm happy to send you real command and log output if there's some way without posting it in public.

ThomasWaldmann commented 2 years ago

OK, guess we can't find the root cause for this any more. Glad it was fixable by clearing the cache!

kruthers64 commented 2 years ago

Sorry to reopen a closed issue but the problem happened again. No need to respond for now, I'm just logging this so I don't forget the details. I'm going to keep watching borg with debugging enabled and hopefully come back later with more info. Anyway here's what happened:

(FYI I'm running borg 1.2.0 on both ends)

I didn't run borg for several days since my last update, did not touch the source filesystem, did not reboot the machines in question, ie. nothing should have changed
yesterday I ran my borg create script and it started adding all files again, ie. complete cache miss

I killed it with CTRL-C, added flag --debug-topic=files_cache, started it again, saw the following:

FILES-CACHE-LOAD: starting...
FILES-CACHE-LOAD: finished, 0 entries loaded.
UNKNOWN: no file metadata in cache for ...

killed it again and looked at the cache (verified the cache dir and repo id matched) and saw the files file was zero length:

# ll /root/.cache/borg/74c608175d26af3fc5e748d4477894abaa1739e92aef460611f13707a246ec43
total 184416
-rw------- 1 root root 188818054 Apr 10 16:25 chunks
drwx------ 2 root root      4096 Apr  4 21:40 chunks.archive.d
-rw------- 1 root root       611 Apr 10 16:25 config
-rw------- 1 root root         0 Apr 10 16:25 files
-rw------- 1 root root         2 Apr 10 16:25 pre12-meta
-rw------- 1 root root        68 Apr  4 21:29 README
drwx------ 2 root root      4096 Apr 10 16:25 txn.active

I started borg again, let it run all day and re-add all files to the cache, after done cache dir looked like:

total 779068
-rw------- 1 root root 188818054 Apr 11 00:39 chunks
drwx------ 2 root root      4096 Apr  4 21:40 chunks.archive.d
-rw------- 1 root root       611 Apr 11 00:39 config
-rw------- 1 root root 608921901 Apr 11 00:39 files
-rw------- 1 root root         2 Apr 11 00:39 pre12-meta
-rw------- 1 root root        68 Apr  4 21:29 README

today I re-ran borg and it was fine, caching as expected

No idea what is zeroing out the cache file but I should get debugging info from it next time it happens.

jdchristensen commented 2 years ago

Maybe have your backup script do ls -l on the cache dir before each backup starts?

kruthers64 commented 2 years ago

Well at this point I'm only running it by hand, and I'm doing watch ls -l on the cache dir in another window...

kruthers64 commented 2 years ago

Just an update - I have not been able to recreate this problem, and I've been trying. The caching has been working exactly as expected for a while.

One thing I tried several times was killing borg create (with CTRL-C) right at the end of the process when the files cache is being re-written. But no, borg handles it really well and finishes writing the file before acknowledging the signal and exiting.

Anyway, I probably won't update this issue again unless I catch the problem in a log file.

ThatCoffeeGuy commented 3 months ago

I am just facing this issue again, when 5TB of data was copied from drive to repo, then when I try to copy the very same, matching metadata and everything fileset from a remote machine, it starts from zero and tries to add every single file.

I understand this might be by design, but what is causing this? Why does it not understand that those files are exactly the same, they were copied with rsync over time.

I certainly don't want to copy 5TB over the network because of this, they literally already exist on the target machine. What are my options here?

ThomasWaldmann commented 3 months ago

@ThatCoffeeGuy check the borg FAQ and the docs about the files cache. Usually it is either a change src path, a deleted files cache or touched file metadata.

ThatCoffeeGuy commented 3 months ago

@ThatCoffeeGuy check the borg FAQ and the docs about the files cache. Usually it is either a change src path, a deleted files cache or touched file metadata.

I appreciate your input on this.

@ThomasWaldmann I started with the FAQ but I believe things are getting a little more complicated here. I tried copying the .cache, with no success. Then I deleted it, no success. Is there any supported way of NOT forcing me to copy terabytes of already existing data over the internet again? Touched file or metadata should really not be the case at all.

ThomasWaldmann commented 3 months ago

borg NEVER copies "terabytes of already existing data to a repository".

if you cause files-cache misses due to the mentioned changes, it will re-read and re-hash the source files on the client (and this is the topic of this ticket, see top post), but it won't write them to the repository because the hashes of the existing data in the repo are known locally.

but anyway, it is bad to discuss this in a years old "could not reproduce ticket", so if you can reproduce with a current borg version and you checked it is non of the mentioned things, then please file a new issue and give all the required data so I can investigate.

ThomasWaldmann commented 3 months ago

Note: I fixed the issue topic to more precisely describe the issue in the top post.

borgbackup / borg