borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
10.73k stars 733 forks source link

`borg check` hangs after errors #8230

Closed sshaikh closed 1 month ago

sshaikh commented 1 month ago

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Bug

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.2.8

Operating system (distribution) and version.

Debian 12

Hardware / network configuration, and filesystems used.

Remote repo (local over ssh). ext4 on both

How much data is handled by borg?

~200GB

Full borg commandline that lead to the problem (leave away excludes and passwords)

sudo borg --info --progress check --repository-only repo

Describe the problem you're observing.

The repo has errors of the like:

Remote: Data integrity error: Segment entry checksum mismatch [segment 81, offset 24001333]

but continues until it has checked 100% of segments. It then prints lines of the following:

Remote: ID: blahblah rebuilt index: <not found>      committed index: (277, 389662029)

After which it then it seems to hang (no progress).

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yes, each repo check command results in this.

Include any warning/errors/backtraces from the system logs

None apart from the output above.

--repair has not been tried, in case the developers wish to have a look first.

ThomasWaldmann commented 1 month ago

Guess you have multiple issues here:

  1. you have data corruption in that repo, that is why the crc32 check is failing ("Segment entry checksum mismatch") - try to find the root cause of this. That some stuff in the rebuilt index is missing is then expected, it likely is the stuff that was in the segment entries with the failing crc32 check (because borg will ignore corrupted data).

  2. IIRC, there should not be a way for borg itself to "hang" for a long time. But you may have a network or ssh connection issue causing this. So try to find out if borg is really hanging (doing nothing) or whether it is still doing something with disk I/O or cpu load.

sshaikh commented 1 month ago
  1. I don't believe the corruption is systemic; I have recently been playing with my infra which has lent to an unstable environment and this check has only been failing relatively recently. I have another (local) repo that is intact, so I'm happy to just resolve this error (eg with a repair) than spend time investigating its cause. Unless you'd find that useful, of course.
  2. top and ps both locally and on the remote seem to indicate borg doing not much. Is there anything else I can do to see whether it's actually dead or not? I could try it again with debug verbosity.
ThomasWaldmann commented 1 month ago

You can try debug log level, but if the network/ssh connection is stuck, I guess you won't see anything more.

sshaikh commented 1 month ago

Running with debug, the last output I see is:

RemoteRepository: 183 B bytes sent, 18.19 MB bytes received, 3 messages sent

Again, there's no activity and as far as I can tell the network is up. I'll let this run overnight but I'm not expecting any change.

I'll try a repair next.

sshaikh commented 1 month ago

Just to update:

Repairing both fixed the repo and ended successfully.

Checking the archives found issues and ended successfully.

I'm not convinced it was a system issue, but happy to close this anyway.