metadata all-zero chunk processing issue

DrBrynzo commented 2 years ago

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

BUG

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.2.0

Operating system (distribution) and version.

macOS Darwin mini 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:47:26 PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T8101 arm64

Hardware / network configuration, and filesystems used.

APFS source to different APFS-hosted repo, local backup (no networking)

How much data is handled by borg?

~700 GB

Full borg commandline that lead to the problem (leave away excludes and passwords)

/Users/bryn/Library/Python/3.8/bin/borg -v create --exclude-from /Users/bryn/scripts/borgmatic/includes/excludes-global --exclude-caches --exclude-if-present .nobackup --compression zstd,8 --one-file-system --info --stats --progress /Volumes/crate/borg/repo::mini-bryn-{now} /Users/bryn/.config/rclone /Users/bryn/.ssh /Users/bryn/Documents /Users/bryn/KeePass /Users/bryn/Library/Mobile\ Documents /Users/bryn/Movies /Users/bryn/Music /Users/bryn/Personal /Users/bryn/Pictures /Users/bryn/Source /Users/bryn/scripts /Volumes/crate/bryn /Volumes/wedge/movies /Volumes/wedge/photos

NOTE: This is the pip-installed version. I've also tried the MacPorts version (appears to use Python 3.10 instead) with the same results.

Describe the problem you're observing.

Okay, bear with me, this is weird (I think).

I have two repos. Repo A (/Volumes/crate/borg/mini-bryn) is a backup just "my" data. Repo B (/Volumes/crate/borg/repo) is a backup of mine and my wife's data. Repo B is intended to be the "way forward" and I'm attempting to consolidate all our backups into a single repo with different prefixes and such but hopefully taking better advantage of deduplication to save some disk space on a local-connected RAID enclosure that hosts the repos. Just to be clear, I'm not attempting to import any "old" backups, just change which repo the backup command is pointed to.

Backups to Repo A work fine. Backups to Repo B work fine.

When I change the command line for the backup that was saving to Repo A to point at Repo B to start "Backup A" backing up to Repo B instead of Repo A the backup crashes with a "Local Exception".

If I switch the command line back to use Repo A (again, the ONLY change) it works fine again. A checkpoint is left in Repo B and the backups that were going to Repo B to start with continue to work with no problem as well.

In a nutshell, I simply can't add this particular backup to Repo B for some reason. I'm continuing to backup in parallel to both repos (Backup A to Repo A and Backup B to Repo B) with no problems for the time being.

One point of note is that the backups are failing during roughly the same part of the job where it gets to some big video files (a mix of raw *.dv, compressed .mp4, uncompressed and compressed MPEG) but it does NOT fail on the same video file each time, just at some point around the video folders.

All filesystems check out as clean. All repos run full checks with no problems reported. Please ignore the paths including "borgmatic". The failures were happening using borgmatic but all of my troubleshooting has been just using borg directly with no wrappers/scripts.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yes, I can just run the same backup command line and switch the repo from A to B to produce the same error.

Include any warning/errors/backtraces from the system logs

This is from the pip version of borg 1.2.0:

Creating archive at "/Volumes/crate/borg/repo::mini-bryn-2022-04-13T12:42:49"
Local Exception.93 GB C 2.80 GB D 31422 N Volumes/crate/bryn/iDVD/Joybang - Mellon J...2E8611DBBE98000A95BD93FA6D656E75_4000000_Main
Traceback (most recent call last):
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archiver.py", line 5089, in main
    exit_code = archiver.run(args)
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archiver.py", line 5020, in run
    return set_ec(func(args))
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archiver.py", line 183, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archiver.py", line 668, in do_create
    create_inner(archive, cache, fso)
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archiver.py", line 605, in create_inner
    self._rec_walk(path=path, parent_fd=parent_fd, name=name,
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archiver.py", line 813, in _rec_walk
    self._rec_walk(
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archiver.py", line 813, in _rec_walk
    self._rec_walk(
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archiver.py", line 813, in _rec_walk
    self._rec_walk(
  [Previous line repeated 4 more times]
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archiver.py", line 779, in _rec_walk
    status = self._process_any(path=path, parent_fd=parent_fd, name=name, st=st, fso=fso, cache=cache,
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archiver.py", line 681, in _process_any
    return fso.process_file(path=path, parent_fd=parent_fd, name=name, st=st, cache=cache)
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archive.py", line 1428, in process_file
    return status
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archive.py", line 1290, in create_helper
    self.add_item(item, stats=self.stats)
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archive.py", line 589, in add_item
    self.items_buffer.add(item)
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archive.py", line 354, in add
    self.flush()
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archive.py", line 368, in flush
    chunks = list(bytes(chunk.data) for chunk in self.chunker.chunkify(self.buffer))
  File "/Users/bryn/Library/Python/3.8/lib/python/site-packages/borg/archive.py", line 368, in <genexpr>
    chunks = list(bytes(chunk.data) for chunk in self.chunker.chunkify(self.buffer))
TypeError: cannot convert 'NoneType' object to bytes

Platform: Darwin mini 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:47:26 PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T8101 arm64
Borg: 1.2.0  Python: CPython 3.8.9 msgpack: 1.0.3 fuse: None [pyfuse3,llfuse]
PID: 55848  CWD: /Users/bryn/scripts/borgmatic
sys.argv: ['/Users/bryn/Library/Python/3.8/bin/borg', '-v', 'create', '--exclude-from', '/Users/bryn/scripts/borgmatic/includes/excludes-global', '--exclude-caches', '--exclude-if-present', '.nobackup', '--compression', 'zstd,8', '--one-file-system', '--info', '--stats', '--progress', '/Volumes/crate/borg/repo::mini-bryn-{now}', '/Users/bryn/.config/rclone', '/Users/bryn/.ssh', '/Users/bryn/Documents', '/Users/bryn/KeePass', '/Users/bryn/Library/Mobile Documents', '/Users/bryn/Movies', '/Users/bryn/Music', '/Users/bryn/Personal', '/Users/bryn/Pictures', '/Users/bryn/Source', '/Users/bryn/scripts', '/Volumes/crate/bryn', '/Volumes/wedge/movies', '/Volumes/wedge/photos']
SSH_ORIGINAL_COMMAND: None

This is from the MacPorts version of borg 1.2.0:

Creating archive at "/Volumes/crate/borg/repo::mini-bryn-2022-04-13T12:53:12"
Local Exception.93 GB C 2.80 GB D 31422 N Volumes/crate/bryn/iDVD/Joybang - Mellon J...2E8611DBBE98000A95BD93FA6D656E75_4000000_Main
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archiver.py", line 5089, in main
    exit_code = archiver.run(args)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archiver.py", line 5020, in run
    return set_ec(func(args))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archiver.py", line 183, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archiver.py", line 668, in do_create
    create_inner(archive, cache, fso)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archiver.py", line 605, in create_inner
    self._rec_walk(path=path, parent_fd=parent_fd, name=name,
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archiver.py", line 813, in _rec_walk
    self._rec_walk(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archiver.py", line 813, in _rec_walk
    self._rec_walk(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archiver.py", line 813, in _rec_walk
    self._rec_walk(
  [Previous line repeated 4 more times]
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archiver.py", line 779, in _rec_walk
    status = self._process_any(path=path, parent_fd=parent_fd, name=name, st=st, fso=fso, cache=cache,
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archiver.py", line 681, in _process_any
    return fso.process_file(path=path, parent_fd=parent_fd, name=name, st=st, cache=cache)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archive.py", line 1365, in process_file
    with self.create_helper(path, st, None) as (item, status, hardlinked, hardlink_master):  # no status yet
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/contextlib.py", line 142, in __exit__
    next(self.gen)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archive.py", line 1290, in create_helper
    self.add_item(item, stats=self.stats)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archive.py", line 589, in add_item
    self.items_buffer.add(item)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archive.py", line 354, in add
    self.flush()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archive.py", line 368, in flush
    chunks = list(bytes(chunk.data) for chunk in self.chunker.chunkify(self.buffer))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/borg/archive.py", line 368, in <genexpr>
    chunks = list(bytes(chunk.data) for chunk in self.chunker.chunkify(self.buffer))
TypeError: cannot convert 'NoneType' object to bytes

Platform: Darwin mini 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:47:26 PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T8101 arm64
Borg: 1.2.0  Python: CPython 3.10.4 msgpack: 1.0.3 fuse: None [pyfuse3,llfuse]
PID: 55982  CWD: /Users/bryn/scripts/borgmatic
sys.argv: ['/opt/local/bin/borg', '-v', 'create', '--exclude-from', '/Users/bryn/scripts/borgmatic/includes/excludes-global', '--exclude-caches', '--exclude-if-present', '.nobackup', '--compression', 'zstd,8', '--one-file-system', '--info', '--stats', '--progress', '/Volumes/crate/borg/repo::mini-bryn-{now}', '/Users/bryn/.config/rclone', '/Users/bryn/.ssh', '/Users/bryn/Documents', '/Users/bryn/KeePass', '/Users/bryn/Library/Mobile Documents', '/Users/bryn/Movies', '/Users/bryn/Music', '/Users/bryn/Personal', '/Users/bryn/Pictures', '/Users/bryn/Source', '/Users/bryn/scripts', '/Volumes/crate/bryn', '/Volumes/wedge/movies', '/Volumes/wedge/photos']
SSH_ORIGINAL_COMMAND: None

ThomasWaldmann commented 2 years ago

Interesting. This definitely looks like a bug, but noone else was having this yet (AFAIK).

I also run borg on macOS and did not have that yet either.

ThomasWaldmann commented 2 years ago

One potential root cause why you are seeing this for repo B but not A might be the chunker secret.

If you borg init a new repo, key material is generated from random (AES encryption key, HMAC key, ID key, chunker secret). The chunker secret influences at which places borg cuts a file into chunks.

So, if you have different repos, different chunks are cut. As the problem is with the return value of chunker.chunkify, that might explain the different behaviour crashing vs. not crashing. (this is just a reasonable guess for now)

ThomasWaldmann commented 2 years ago

From 1.2.0 source:

        # The chunker returns a memoryview to its internal buffer,
        # thus a copy is needed before resuming the chunker iterator.
        # note: this is the items metadata stream chunker, we only will get CH_DATA allocation here (because there are,
        #       no all-zero chunks in a metadata stream), thus chunk.data will always be bytes/memoryview and allocation
        #       is always CH_DATA and never CH_ALLOC/CH_HOLE).
        chunks = list(bytes(chunk.data) for chunk in self.chunker.chunkify(self.buffer))

The exception practically shows that the comment is not (always) true.

ThomasWaldmann commented 2 years ago

The chunker iterator:

    def __next__(self):
        data = chunker_process(self.chunker)
        got = len(data)
        # we do not have SEEK_DATA/SEEK_HOLE support in chunker_process C code,
        # but we can just check if data was all-zero (and either came from a hole
        # or from stored zeros - we can not detect that here).
        if zeros.startswith(data):
            data = None
            allocation = CH_ALLOC
        else:
            allocation = CH_DATA
        return Chunk(data, size=got, allocation=allocation)

DrBrynzo commented 2 years ago

I don't know if it helps, but Repo A goes back to 2019 and Repo B was created just this month. I was using the MacPorts version and had been for years until I started troubleshooting this. If Time Machine isn't lying to me, Repo B was created with 1.2.0 and just based on history Repo A is probably an older 1.1 version.

ThomasWaldmann commented 2 years ago

So the only way how chunk.data can be None is that data was all-zero here.

That means that the chunker cut a chunk from the metadata stream that only contained zeros (something that I did not think is possible when writing that code).

ThomasWaldmann commented 2 years ago

See PR #6591 or alternatively try this patch:

diff --git a/src/borg/archive.py b/src/borg/archive.py
index 0f0c8ffb..20d52699 100644
--- a/src/borg/archive.py
+++ b/src/borg/archive.py
@@ -361,10 +361,18 @@ def flush(self, flush=False):
         self.buffer.seek(0)
         # The chunker returns a memoryview to its internal buffer,
         # thus a copy is needed before resuming the chunker iterator.
-        # note: this is the items metadata stream chunker, we only will get CH_DATA allocation here (because there are,
-        #       no all-zero chunks in a metadata stream), thus chunk.data will always be bytes/memoryview and allocation
-        #       is always CH_DATA and never CH_ALLOC/CH_HOLE).
-        chunks = list(bytes(chunk.data) for chunk in self.chunker.chunkify(self.buffer))
+        # the metadata stream may produce all-zero chunks, so deal
+        # with CH_ALLOC (and CH_HOLE, for completeness) here.
+        chunks = []
+        for chunk in self.chunker.chunkify(self.buffer):
+            alloc = chunk.meta['allocation']
+            if alloc == CH_DATA:
+                data = bytes(chunk.data)
+            elif alloc in (CH_ALLOC, CH_HOLE):
+                data = zeros[:chunk.meta['size']]
+            else:
+                raise ValueError("chunk allocation has unsupported value of %r" % alloc)
+            chunks.append(data)
         self.buffer.seek(0)
         self.buffer.truncate(0)
         # Leave the last partial chunk in the buffer unless flush is True

ThomasWaldmann commented 2 years ago

Thanks for finding this. If you could test the patch / the PR code, that would be great!

DrBrynzo commented 2 years ago

Testing now. It's been running a little over an hour and is already well past where it was failing before. Based on current progress it should finish up sometime in the next couple hours.

DrBrynzo commented 2 years ago

Ship it!

Creating archive at "/Volumes/crate/borg/repo::mini-bryn-2022-04-13T15:45:40"
------------------------------------------------------------------------------                                                      
Repository: /Volumes/crate/borg/repo
Archive name: mini-bryn-2022-04-13T15:45:40
Archive fingerprint: <redacted, not sure if necessary/recommended>
Time (start): Wed, 2022-04-13 15:45:41
Time (end):   Wed, 2022-04-13 18:12:21
Duration: 2 hours 26 minutes 40.22 seconds
Number of files: 291694
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              973.73 GB            928.68 GB             44.39 GB
All archives:                4.57 TB              4.14 TB            751.54 GB

                       Unique chunks         Total chunks
Chunk index:                  516951              2600975
------------------------------------------------------------------------------

ThomasWaldmann commented 2 years ago

Fixed in 1.2-maint, so it will be in next release (1.2.1). Also fixed in master branch. Does not apply to 1.1-maint.

borgbackup / borg