Closed DrBrynzo closed 2 years ago
Interesting. This definitely looks like a bug, but noone else was having this yet (AFAIK).
I also run borg on macOS and did not have that yet either.
One potential root cause why you are seeing this for repo B but not A might be the chunker secret.
If you borg init
a new repo, key material is generated from random (AES encryption key, HMAC key, ID key, chunker secret). The chunker secret influences at which places borg cuts a file into chunks.
So, if you have different repos, different chunks are cut. As the problem is with the return value of chunker.chunkify
, that might explain the different behaviour crashing vs. not crashing. (this is just a reasonable guess for now)
From 1.2.0 source:
# The chunker returns a memoryview to its internal buffer,
# thus a copy is needed before resuming the chunker iterator.
# note: this is the items metadata stream chunker, we only will get CH_DATA allocation here (because there are,
# no all-zero chunks in a metadata stream), thus chunk.data will always be bytes/memoryview and allocation
# is always CH_DATA and never CH_ALLOC/CH_HOLE).
chunks = list(bytes(chunk.data) for chunk in self.chunker.chunkify(self.buffer))
The exception practically shows that the comment is not (always) true.
The chunker iterator:
def __next__(self):
data = chunker_process(self.chunker)
got = len(data)
# we do not have SEEK_DATA/SEEK_HOLE support in chunker_process C code,
# but we can just check if data was all-zero (and either came from a hole
# or from stored zeros - we can not detect that here).
if zeros.startswith(data):
data = None
allocation = CH_ALLOC
else:
allocation = CH_DATA
return Chunk(data, size=got, allocation=allocation)
I don't know if it helps, but Repo A goes back to 2019 and Repo B was created just this month. I was using the MacPorts version and had been for years until I started troubleshooting this. If Time Machine isn't lying to me, Repo B was created with 1.2.0 and just based on history Repo A is probably an older 1.1 version.
So the only way how chunk.data
can be None
is that data
was all-zero here.
That means that the chunker cut a chunk from the metadata stream that only contained zeros (something that I did not think is possible when writing that code).
See PR #6591 or alternatively try this patch:
diff --git a/src/borg/archive.py b/src/borg/archive.py
index 0f0c8ffb..20d52699 100644
--- a/src/borg/archive.py
+++ b/src/borg/archive.py
@@ -361,10 +361,18 @@ def flush(self, flush=False):
self.buffer.seek(0)
# The chunker returns a memoryview to its internal buffer,
# thus a copy is needed before resuming the chunker iterator.
- # note: this is the items metadata stream chunker, we only will get CH_DATA allocation here (because there are,
- # no all-zero chunks in a metadata stream), thus chunk.data will always be bytes/memoryview and allocation
- # is always CH_DATA and never CH_ALLOC/CH_HOLE).
- chunks = list(bytes(chunk.data) for chunk in self.chunker.chunkify(self.buffer))
+ # the metadata stream may produce all-zero chunks, so deal
+ # with CH_ALLOC (and CH_HOLE, for completeness) here.
+ chunks = []
+ for chunk in self.chunker.chunkify(self.buffer):
+ alloc = chunk.meta['allocation']
+ if alloc == CH_DATA:
+ data = bytes(chunk.data)
+ elif alloc in (CH_ALLOC, CH_HOLE):
+ data = zeros[:chunk.meta['size']]
+ else:
+ raise ValueError("chunk allocation has unsupported value of %r" % alloc)
+ chunks.append(data)
self.buffer.seek(0)
self.buffer.truncate(0)
# Leave the last partial chunk in the buffer unless flush is True
Thanks for finding this. If you could test the patch / the PR code, that would be great!
Testing now. It's been running a little over an hour and is already well past where it was failing before. Based on current progress it should finish up sometime in the next couple hours.
Ship it!
Creating archive at "/Volumes/crate/borg/repo::mini-bryn-2022-04-13T15:45:40"
------------------------------------------------------------------------------
Repository: /Volumes/crate/borg/repo
Archive name: mini-bryn-2022-04-13T15:45:40
Archive fingerprint: <redacted, not sure if necessary/recommended>
Time (start): Wed, 2022-04-13 15:45:41
Time (end): Wed, 2022-04-13 18:12:21
Duration: 2 hours 26 minutes 40.22 seconds
Number of files: 291694
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 973.73 GB 928.68 GB 44.39 GB
All archives: 4.57 TB 4.14 TB 751.54 GB
Unique chunks Total chunks
Chunk index: 516951 2600975
------------------------------------------------------------------------------
Fixed in 1.2-maint, so it will be in next release (1.2.1). Also fixed in master branch. Does not apply to 1.1-maint.
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
BUG
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
borg 1.2.0
Operating system (distribution) and version.
macOS Darwin mini 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:47:26 PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T8101 arm64
Hardware / network configuration, and filesystems used.
APFS source to different APFS-hosted repo, local backup (no networking)
How much data is handled by borg?
~700 GB
Full borg commandline that lead to the problem (leave away excludes and passwords)
/Users/bryn/Library/Python/3.8/bin/borg -v create --exclude-from /Users/bryn/scripts/borgmatic/includes/excludes-global --exclude-caches --exclude-if-present .nobackup --compression zstd,8 --one-file-system --info --stats --progress /Volumes/crate/borg/repo::mini-bryn-{now} /Users/bryn/.config/rclone /Users/bryn/.ssh /Users/bryn/Documents /Users/bryn/KeePass /Users/bryn/Library/Mobile\ Documents /Users/bryn/Movies /Users/bryn/Music /Users/bryn/Personal /Users/bryn/Pictures /Users/bryn/Source /Users/bryn/scripts /Volumes/crate/bryn /Volumes/wedge/movies /Volumes/wedge/photos
NOTE: This is the pip-installed version. I've also tried the MacPorts version (appears to use Python 3.10 instead) with the same results.
Describe the problem you're observing.
Okay, bear with me, this is weird (I think).
I have two repos. Repo A (/Volumes/crate/borg/mini-bryn) is a backup just "my" data. Repo B (/Volumes/crate/borg/repo) is a backup of mine and my wife's data. Repo B is intended to be the "way forward" and I'm attempting to consolidate all our backups into a single repo with different prefixes and such but hopefully taking better advantage of deduplication to save some disk space on a local-connected RAID enclosure that hosts the repos. Just to be clear, I'm not attempting to import any "old" backups, just change which repo the backup command is pointed to.
Backups to Repo A work fine. Backups to Repo B work fine.
When I change the command line for the backup that was saving to Repo A to point at Repo B to start "Backup A" backing up to Repo B instead of Repo A the backup crashes with a "Local Exception".
If I switch the command line back to use Repo A (again, the ONLY change) it works fine again. A checkpoint is left in Repo B and the backups that were going to Repo B to start with continue to work with no problem as well.
In a nutshell, I simply can't add this particular backup to Repo B for some reason. I'm continuing to backup in parallel to both repos (Backup A to Repo A and Backup B to Repo B) with no problems for the time being.
One point of note is that the backups are failing during roughly the same part of the job where it gets to some big video files (a mix of raw *.dv, compressed .mp4, uncompressed and compressed MPEG) but it does NOT fail on the same video file each time, just at some point around the video folders.
All filesystems check out as clean. All repos run full checks with no problems reported. Please ignore the paths including "borgmatic". The failures were happening using borgmatic but all of my troubleshooting has been just using borg directly with no wrappers/scripts.
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
Yes, I can just run the same backup command line and switch the repo from A to B to produce the same error.
Include any warning/errors/backtraces from the system logs
This is from the pip version of borg 1.2.0:
This is from the MacPorts version of borg 1.2.0: