borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.11k stars 740 forks source link

borg check errors after upgrade to 1.2.0 #6687

Closed jdchristensen closed 10 months ago

jdchristensen commented 2 years ago

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes.

Is this a BUG / ISSUE report or a QUESTION?

Possible bug.

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

1.2.0 on all clients and servers. Previous version was 1.1.17 (not 1.1.7, as I wrote on the mailing list).

Operating system (distribution) and version.

Debian buster and Ubuntu 20.04 on servers. Debian buster, Ubuntu 20.04 and Ubuntu 21.10 on clients.

Hardware / network configuration, and filesystems used.

Multiple local and remote clients accessing each repository. Repositories are on ext4, on RAID 1 mdadm devices, with spinning disks underlying them. The Debian server also uses lvm.

How much data is handled by borg?

The repos are all around 100GB in size, with up to 400 archives each. The repositories have been in use for many years.

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg check /path/to/repo [more details below]

Describe the problem you're observing.

borg check shows errors on three different repositories on two different machines. See below for details.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yes, borg check shows the same errors when run again.

Include any warning/errors/backtraces from the system logs

I upgraded from borg 1.1.17 to 1.2.0 on several different systems on about April 9. On May 9, my monthly "borg check" runs gave errors on three repositories on two systems. Note that I use the setup where several clients do their backups into the same repositories. I don't have any non-shared repositories for comparison.

At the time of the upgrade from 1.1.17 to 1.2.0, I ran borg compact --cleanup-commits ... followed by borg check ... on all repos. There were no errors then. After that, I run borg compact without --cleanup-commits followed by borg check once per month. The errors occurred at the one month mark.

System 1 runs Ubuntu 20.04. Two of the three repos on this machine now have errors:

# borg check /Backups/borg/home.borg
Index object count mismatch.
committed index: 1166413 objects
rebuilt index:   1166414 objects
ID: 8a158ba7fdfae9b1373063a5bb5ea8ea6698c93ed7feff89ca6ff0a3c8842ebd
rebuilt index: (18596, 199132336) committed index: <not found>
Finished full repository check, errors found.

# ls -l /Backups/borg/home.borg/data/37
total 1453100
-rw------- 2 bu bu 201259844 Dec  8  2020 18596
-rw------- 2 bu bu 185611530 Dec 12  2020 18651
-rw------- 2 bu bu 125106377 Dec 25  2020 18858
-rw------- 2 bu bu 524318301 Dec 26  2020 18874
-rw------- 2 bu bu 193813842 Dec 30  2020 18940
-rw------- 2 bu bu 116657254 Dec 30  2020 18945
-rw------- 2 bu bu 141181725 Dec 31  2020 18953

# borg check /Backups/borg/system.borg
Index object count mismatch.
committed index: 2324200 objects
rebuilt index:   2324202 objects
ID: 1e20354918f4fdeb9cc0d677c28dffe1a383dd1b0db11ebcbc5ffb809d3c2b8a
rebuilt index: (24666, 60168) committed index: <not found>
ID: d9c516b5bf53f661a1a9d2ada08c8db7c33a331713f23e058cd6969982728157
rebuilt index: (3516, 138963001) committed index: <not found>
Finished full repository check, errors found.

# ls -l /Backups/borg/system.borg/data/49
total 3316136
-rw------- 2 bu bu 500587725 Oct  5  2021 24555
-rw------- 2 bu bu 168824081 Oct  8  2021 24603
-rw------- 2 bu bu 116475028 Oct  9  2021 24619
-rw------- 2 bu bu 107446533 Oct 11  2021 24634
-rw------- 2 bu bu 252958665 Oct 12  2021 24666
-rw------- 2 bu bu 124871243 Oct 19  2021 24777
-rw------- 2 bu bu 277627834 Oct 19  2021 24793
-rw------- 2 bu bu 231932763 Oct 21  2021 24835
-rw------- 2 bu bu 114031902 Oct 22  2021 24847
-rw------- 2 bu bu 127020577 Oct 26  2021 24899
-rw------- 2 bu bu 220293895 Oct 26  2021 24907
-rw------- 2 bu bu 113238393 Oct 27  2021 24933
-rw------- 2 bu bu 525154704 Oct 27  2021 24941
-rw------- 2 bu bu 291472023 Oct 27  2021 24943
-rw------- 2 bu bu 223721033 Oct 30  2021 24987

# ls -l /Backups/borg/system.borg/data/7
total 1200244
-rw------- 2 bu bu 524615544 Feb  4  2018 3516
-rw------- 2 bu bu 145502511 Feb  5  2018 3529
-rw------- 2 bu bu 266037549 Feb 21  2018 3740
-rw------- 2 bu bu 292869056 Mar 14  2018 3951

System 2 runs Debian buster. One of the three repos on this machine now has errors:

# borg check /Backups/borg/system.borg
Index object count mismatch.
committed index: 2052187 objects
rebuilt index:   2052188 objects
ID: 6b734ed388e7e086af7107847c6b6d3d34a29c20e7e539ded71b32606cb857bd
rebuilt index: (946, 15871355) committed index: <not found>
Finished full repository check, errors found.

# ls -l /Backups/borg/system.borg/data/1
total 205308
-rw------- 1 bu bu 210234581 Jun 20  2017 946

I have used borg on these systems for years, and no hardware has changed recently. System 1 has the repos on a RAID 1 mdadm device with two SATA spinning disks. System 2 also has the repos on RAID 1 mdadm devices with two SATA disks, with lvm as a middle layer. In both cases, smartctl shows no issues for any of the drives, and memtester also shows no errors.

Since the errors have happened on different machines within a month of upgrading to 1.2.0, I am concerned that this is a borg issue rather than a hardware issue. It is also suspicious to me that the error is the same in all cases, with a committed index not found. Hardware errors tend to produce garbage.

I have not run repair yet. Is there anything I should do before running repair to try to figure out the issue?

Update: there is a bounty for finding/fixing this bug: https://app.bountysource.com/issues/108445140-borg-check-errors-after-upgrade-to-1-2-0

ThomasWaldmann commented 10 months ago

Did someone else except @horazont do practical testing with the current 1.2-maint code?

Guess it would be good to get more practical testing before 1.2.7 release, especially considering this issue here.

ThomasWaldmann commented 10 months ago

OK, guess I'll just close this as fixed (see https://github.com/borgbackup/borg/issues/6687#issuecomment-1785650521 ) until otherwise proven.

If somebody is seeing something that looks like this issue, please first do a borg check --repair on the repo to rebuild the compaction info. Then try to reproduce.

Above mentioned fixes will be in borg 1.2.7 soon... (master branch is also fixed).

jdchristensen commented 10 months ago

I think the Changelog should mention that users who have had problems with borg check involving orphaned chunks should run borg check --repair on the repo after upgrading to 1.2.7. I'm going to do that today, and am confident that my monthly checks will stop reporting orphaned chunks. Thanks for figuring this out!

Oh, I also just realized that the Changelog seems to mention an incorrect issue/PR:

check/compact: fix spurious reappearance of orphan chunks since borg 1.2, #6687 - this consists of 2 fixes:
for existing chunks: check --repair: recreate shadow index, #6687
for newly created chunks: update shadow index when doing a double-put, #5661

Should the "5661" be "7896"? And maybe the second "6687" should be "7897", to point to the PR instead of this issue?

ThomasWaldmann commented 10 months ago

@jdchristensen thanks for the feedback! some links for easier checking of these issues:

Updated the change log: https://github.com/borgbackup/borg/pull/7961

Usually I rather give the issue number than the PR number, but here I just gave both now. There was quite some back and forth here...

palbr commented 10 months ago

Thanks a lot. The effort you put in BorgBackup is amazing!

How I was affected by this issue

I have been using BorgBackup 1.1.15 for many years and love its functionality and reliability! I am using Debian Linux and my Borg repository is on an external HDD, which is attached via USB during the time of backup creation. On 3rd November 2023, I decided to migrate to a newer version of BorgBackup. I followed the [Notes](https://borgbackup.readthedocs.io/en/1.2-maint/changes.html#important-notes) (excellent documentation, BTW!): 1. Upgrade to BorgBackup 1.1.18 2. `borg create` 3. `borg prune` 4. `borg check --verify-data`: ``` Completed repository check, no problems found. Finished cryptographic data integrity verification, verified 528541 chunks with 0 integrity errors. Archive consistency check complete, no problems found. terminating with success status, rc 0 ``` 5. Upgrade to BorgBackup 1.2.6 6. address [1.2.5 TAM issue](https://borgbackup.readthedocs.io/en/1.2-maint/changes.html#pre-1-2-5-archives-spoofing-vulnerability-cve-2023-36811): All my 54 archives are already `tam:verified` :grinning: 7. `borg compact --cleanup-commits` (freed up 3 GB of disc space - thank you :+1: ) 8. `borg check --verify-data`: ``` Finished full repository check, no problems found. Finished cryptographic data integrity verification, verified 528541 chunks with 0 integrity errors. Archive consistency check complete, no problems found. terminating with success status, rc 0 ``` 9. run `borg info` to build the local pre12-meta cache Did some `borg create` runs and tested the results. Everything looked good. I happily created 65 further archives during the next days. (`borg create`) On 30th November 2023 it was time for my monthly `borg prune` and since I am using BorgBackup 1.2.6 now, I also started a `borg compact` run. Both commands worked without problems. Afterwards I did a `borg check --verify-data`, which gave me: ``` Starting repository check finished segment check at segment 11882 Starting repository index check Index object count mismatch. committed index: 528960 objects rebuilt index: 528965 objects ID: 4bca288173b1b81cf3f0844369ee5d83ad6763e0b769c0b04fedf6b79be799cd rebuilt index: (10693, 162087387) committed index: ID: b54c28821a17ac98e280a1027499771a9e48fde5610d6bd1dad88d6a3283d5f0 rebuilt index: (10693, 278089494) committed index: ID: e94079e3335847d23038b941351c37f0848768ac6aaf166e12d5179b7a61c50b rebuilt index: (8448, 616114) committed index: ID: 4e75c8f9d27cd882d0c29385b73d7a6d9ad744c60ec67c8ba86f74c28f50ade1 rebuilt index: (11074, 187051999) committed index: ID: 457ecb4e89cb279468a56b5e8ffb6cb43a4909991b070c5ee0f0e054d84d7a9e rebuilt index: (6563, 1055550) committed index: Finished full repository check, errors found. terminating with warning status, rc 1 ``` This was the first time I ever encountered problems via `borg check`. Research on the internet led me to this ticket #6687. I am sorry, that I couldn't contribute to solve this issue. But I am very grateful, that you could fix it by now. I am looking forward to test BorgBackup 1.2.7 in the next days.

Do you think, this issue and its fix (install borg version >= 1.2.7 and run borg check --repair once) should be added to Upgrade Notes - borg 1.1.x to 1.2.x? To help other users (migrating from 1.1.x to 1.2.x) avoiding to run into this issue.

ThomasWaldmann commented 10 months ago

@palbr Guess if someone upgrades from 1.1.x to 1.2.7+, they might be never affected, so there is nothing to repair in that case.

But I already added it to the 1.2.7 changelog entry (after release though), see above.

Also, guess running borg check --repair is the quite natural thing to do when encountering any issues.