borg check --verify, data integrity error

plmuon commented 7 years ago

Hello,

I had a data integrity error at the end of a very long (12TB) borg check --repair:

Data integrity error: Segment entry checksum mismatch [segment 12555, offset 1752715]                                                               
Traceback (most recent call last):                                                                                                                  
  File "/usr/lib/python3.5/site-packages/borg/archiver.py", line 2052, in main                                                                      
    exit_code = archiver.run(args)                                                                                                                  
  File "/usr/lib/python3.5/site-packages/borg/archiver.py", line 1997, in run                                                                       
    return func(args)                                                                                                                               
  File "/usr/lib/python3.5/site-packages/borg/archiver.py", line 90, in wrapper                                                                     
    return method(self, args, repository=repository, **kwargs)                                                                                      
  File "/usr/lib/python3.5/site-packages/borg/archiver.py", line 168, in do_check                                                                   
    if not repository.check(repair=args.repair, save_space=args.save_space):                                                                        
  File "/usr/lib/python3.5/site-packages/borg/repository.py", line 480, in check                                                                    
    self.compact_segments(save_space=save_space)                                                                                                    
  File "/usr/lib/python3.5/site-packages/borg/repository.py", line 300, in compact_segments                                                         
    for tag, key, offset, data in self.io.iter_objects(segment, include_data=True):                                                                 
  File "/usr/lib/python3.5/site-packages/borg/repository.py", line 705, in iter_objects                                                             
    (TAG_PUT, TAG_DELETE, TAG_COMMIT))                                                                                                              
  File "/usr/lib/python3.5/site-packages/borg/repository.py", line 781, in _read                                                                    
    segment, offset))                                                                                                                               
borg.helpers.IntegrityError: Segment entry checksum mismatch [segment 12555, offset 1752715]                                                        

Platform: Linux d4f04c9c37e9 3.12.6 #1 SMP Wed Dec 14 01:35:05 CST 2016 x86_64                                                                      
Linux:                                                                                                                                              
Borg: 1.0.9  Python: CPython 3.5.2                                                                                                                  
PID: 16  CWD: /data                                                                                                                                 
sys.argv: ['/usr/bin/borg', 'check', '-v', '--repair', 'borg-sol-data']                                                                             
SSH_ORIGINAL_COMMAND: None

I was doing the check first on the box itself (local filesystem on a NAS, borg in a docker container), then on another server that has the borg-backup mounted through NFS (2 days each).

I fear I have to re-create the archive, no fix possible?

enkore commented 7 years ago

Try to re-run it, the trace indicates that some data was first read OK but a later read of the same data was corrupted. More importantly, check your logs (dmesg, smartctl) if there are issues with the disks, either it's a coincidence or indicates bigger trouble with a disk.

(TN: I deleted your earlier identical comment + my reply in #469)

plmuon commented 7 years ago

Thanks I noticed the issue was closed, thus created this issue afterwards. I had already run it twice, once locally, once remote.

The data themselves are on a QNAP NAS. It's dmesg has no relevant entries, smartctl doesn't exist like on linux but the NAS's disk status is OK. It's an mdraid raid6 that is scrubbed regularly, I don't think there is anything wrong with the disks.

There may have been connection issues while creating some of the archives (I had NFS mounted through ssh for a while) that might have caused corruption... But borg check apparently cannot deal with it anymore.

ThomasWaldmann commented 7 years ago

Hmm, could we just remove compact_segments() from borg check --repair? It rather looks like an optimization that is not strictly needed for checking. see #2294.

About this specific issue: it indeed looks like the segment was successfully read first (in the check part) and only failed the second time (in the compact part).

ThomasWaldmann commented 7 years ago

@plmuon any news on this?

i tend to close this as it does not look like a borg issue, but rather some hw (or other "below borg") issue.

plmuon commented 7 years ago

No news, I had to recreate the archive, since then I've had no issues.

ThomasWaldmann commented 7 years ago

ok, thanks for the feedback.

borgbackup / borg

borg check --verify, data integrity error #2137