borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.25k stars 745 forks source link

borg check crashing #3225

Open ThomasWaldmann opened 7 years ago

ThomasWaldmann commented 7 years ago

From the mailing list:

On 26.10.2017 16:01, Dirk Deimeke wrote:

I already did a "borg check --repair" which led to the following exception:

$ borg check --repair /srv/borg/tigacorrupt
'check --repair' is an experimental feature that might result in data loss.
Type 'YES' if you understand this and want to continue: YES
Adding commit tag to segment 879209
Local Exception.
Traceback (most recent call last):
  File "/usr/lib64/python3.4/site-packages/borg/archiver.py", line 2168,
in main
    exit_code = archiver.run(args)
  File "/usr/lib64/python3.4/site-packages/borg/archiver.py", line 2104,
in run
    return set_ec(func(args))
  File "/usr/lib64/python3.4/site-packages/borg/archiver.py", line 107,
in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/usr/lib64/python3.4/site-packages/borg/archiver.py", line 185,
in do_check
    if not repository.check(repair=args.repair, save_space=args.save_space):
  File "/usr/lib64/python3.4/site-packages/borg/repository.py", line
476, in check
    self.io.write_commit()
  File "/usr/lib64/python3.4/site-packages/borg/repository.py", line
820, in write_commit
    fd = self.get_write_fd(no_new=True)
  File "/usr/lib64/python3.4/site-packages/borg/repository.py", line
680, in get_write_fd
    self._write_fd = open(self.segment_filename(self.segment), 'xb')
FileNotFoundError: [Errno 2] No such file or directory:
'/srv/borg/tigacorrupt/data/87/879210'

Platform: Linux len.myown-it.com 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri
Oct 20 20:32:50 UTC 2017 x86_64 x86_64
Linux: CentOS Linux 7.4.1708 Core
Borg: 1.0.11  Python: CPython 3.4.5
PID: 5622  CWD: /srv/borg
sys.argv: ['/bin/borg', 'check', '--repair', '/srv/borg/tigacorrupt']
SSH_ORIGINAL_COMMAND: None

Most important is the line "FileNotFoundError: [Errno 2] No such file or directory: '/srv/borg/tigacorrupt/data/87/879210'"

As Marian told me by mail, I created the directory /srv/borg/tigacorrupt/data/87 and the next repair attempt ran through:

borg check --repair /srv/borg/tigacorrupt
'check --repair' is an experimental feature that might result in data loss.
Type 'YES' if you understand this and want to continue: YES
Adding commit tag to segment 879209
Repository manifest not found!
219852 orphaned objects found!
Archive consistency check complete, problems found.
ThomasWaldmann commented 7 years ago

Hmm, looks like:

In that case, borg code wants to add a commit tag "at the end" (where indicated by cache transaction id) - in this case basically in the middle of nowhere.

ThomasWaldmann commented 6 years ago

Likely not specific to 1.0.x, so moving it to 1.1.x milestone.

dgasaway commented 5 years ago

I seem to be experiencing a similar problem, but the workaround suggested doesn't work (the folder already exists with other files). I know for a fact that there was a problem with the filesystem (fsck moved a file to lost+found), now I'm trying to get the repository back into a usable state.

[root@wolfie 61% backup-parts]# borg check --archives-only /mnt/s3ql/borg
Local Exception
Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/borg/repository.py", line 1283, in get_fd
    return self.fds[segment]
  File "/usr/lib64/python3.6/site-packages/borg/lrucache.py", line 21, in __getitem__
    value = self._cache[key]  # raise KeyError if not found
KeyError: 7835

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line 4434, in main
    exit_code = archiver.run(args)
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line 4366, in run
    return set_ec(func(args))
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line 152, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line 320, in do_check
    verify_data=args.verify_data, save_space=args.save_space):
  File "/usr/lib64/python3.6/site-packages/borg/archive.py", line 1175, in check
    self.key = self.identify_key(repository)
  File "/usr/lib64/python3.6/site-packages/borg/archive.py", line 1223, in identify_key
    cdata = repository.get(some_chunkid)
  File "/usr/lib64/python3.6/site-packages/borg/repository.py", line 1061, in get
    return self.io.read(segment, offset, id)
  File "/usr/lib64/python3.6/site-packages/borg/repository.py", line 1389, in read
    fd = self.get_fd(segment)
  File "/usr/lib64/python3.6/site-packages/borg/repository.py", line 1285, in get_fd
    fd = open(self.segment_filename(segment), 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/s3ql/borg/data/7/7835'

Platform: Linux wolfie 4.19.1-gentoo #1 SMP Tue Apr 9 21:02:27 PDT 2019 x86_64
Linux: Funtoo Linux - baselayout  2.2.2
Borg: 1.1.7  Python: CPython 3.6.8
PID: 3025  CWD: /root/bin/backup-parts
sys.argv: ['/usr/lib/python-exec/python3.6/borg', 'check', '--archives-only', '/mnt/s3ql/borg']
SSH_ORIGINAL_COMMAND: None
ThomasWaldmann commented 5 years ago

@dgasaway there is a segment file missing in your repo: /mnt/s3ql/borg/data/7/7835.

Also, you are running borg check with --archives-only, but the repo obviously has very low-level issues, so a full check is advised.

Using a more recent borg 1.1.x version would be also a good idea in general.

Also, the filesystem having the repo should be very stable and proven and work correctly.

dgasaway commented 5 years ago

Yes, I know that segment file is missing - it was in lost+found in unknown state. The filesystem is pretty reliable, the problem is the network connection it depends upon. Unfortunately, It would take way too long and exceed data caps to do a full repo check. These are my own problems, it's true, but there could be an opportunity for borg to handle these conditions (lost segment file, or repair of same) more gracefully. Borg doesn't need to download and CRC all segment files to know a file is missing and attempt some kind of repair, really.

In any case, I could tell where the files in lost+found belonged, and I moved them, just to get things rolling again. I'm hoping they were in a good state, but I know I could be facing hidden corruption. A way to tell what archives and files in those archives use those segment files would be helpful, if anyone could help with that. This is getting off-topic, though.

Thanks.

ThomasWaldmann commented 5 years ago

The repo index maps chunkid -> segment, offset, size.

The archive's metadata stream has a list of chunkids for each file with content.

borg check should usually tell you if there is something weird, but the highlevel check (archives) is based on the lowlevel check (repository).