borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.24k stars 743 forks source link

Corrupted segment reference count - corrupted index or hints #8535

Open ginkel opened 1 week ago

ginkel commented 1 week ago

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

BUG

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.4.0

Operating system (distribution) and version.

Ubuntu 24.04

Hardware / network configuration, and filesystems used.

How much data is handled by borg?

~ 3 TB

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg compact --info ssh://<user>@<user>.your-storagebox.de:23/./borg/antares

Describe the problem you're observing.

Hi there,

I guess this started when a nightly borg backup was interrupted by a scheduled hardware maintenance at our ISP. Long story short, compacting throws an error Corrupted segment reference count - corrupted index or hints (see full log below).

I have already tried running borg check --repair as well as deleting the last few archives to no avail.

This is the same machine and remote as in #6140, but the error seems to be different.

Any ideas?

Thanks, Thilo

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Include any warning/errors/backtraces from the system logs

Nov 10 04:38:00 antares.<domain> systemd[1]: Starting borgmatic.service - borgmatic backup...
Nov 10 04:39:00 antares.<domain> borgmatic[783516]: INFO /etc/borgmatic/config.yaml: Running command for pre-everything hook
Nov 10 04:39:00 antares.<domain> borgmatic[783516]: WARNING Creating btrfs snapshot at /borg-backup
Nov 10 04:39:00 antares.<domain> borgmatic[783516]: WARNING Create a readonly snapshot of '/' in '//borg-backup'
Nov 10 04:39:00 antares.<domain> borgmatic[783516]: INFO hetzner: Creating archive
Nov 10 04:39:07 antares.<domain> borgmatic[783516]: INFO Creating archive at "ssh://<user>@<user>.your-storagebox.de:23/./borg/antares::antares.<domain>-2024-11-10.04:39"
Nov 10 05:24:45 antares.<domain> borgmatic[783516]: INFO hetzner: Pruning archives
Nov 10 05:26:44 antares.<domain> borgmatic[783516]: INFO hetzner: Compacting segments
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20794 not found, but listed in compaction data
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20795 not found, but listed in compaction data
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20796 not found, but listed in compaction data
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20797 not found, but listed in compaction data
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20798 not found, but listed in compaction data
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20799 not found, but listed in compaction data
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20800 not found, but listed in compaction data
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20801 not found, but listed in compaction data
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20802 not found, but listed in compaction data
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20803 not found, but listed in compaction data
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20804 not found, but listed in compaction data
Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20805 not found, but listed in compaction data
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO Traceback (most recent call last):
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO   File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/remote.py", line 240, in serve
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO     res = f(**args)
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO           ^^^^^^^^^
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO   File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 505, in commit
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO     self.compact_segments(threshold)
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO   File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 880, in compact_segments
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO     assert segments[segment] == 0, 'Corrupted segment reference count - corrupted index or hints'
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO            ^^^^^^^^^^^^^^^^^^^^^^
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO AssertionError: Corrupted segment reference count - corrupted index or hints
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO Platform: Linux antares.<domain> 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO Linux: Unknown Linux
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO Borg: 1.4.0  Python: CPython 3.12.3 msgpack: 1.0.3 fuse: pyfuse3 3.3.0 [pyfuse3,llfuse]
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO PID: 816512  CWD: /
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO sys.argv: ['/usr/bin/borg', 'compact', '--info', 'ssh://<user>@<user>.your-storagebox.de:23/./borg/antares']
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO SSH_ORIGINAL_COMMAND: None
ThomasWaldmann commented 1 week ago

So that traceback also happens AFTER borg check --repair?

BTW:

Nov 10 05:26:47 antares.<domain> borgmatic[783516]: INFO Remote: segment 20797 not found, but listed in compaction data

That is harmless.

Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO   File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 880, in compact_segments
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO     assert segments[segment] == 0, 'Corrupted segment reference count - corrupted index or hints'
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO            ^^^^^^^^^^^^^^^^^^^^^^
Nov 10 05:26:51 antares.<domain> borgmatic[783516]: INFO AssertionError: Corrupted segment reference count - corrupted index or hints

But, obviously, that shouldn't happen.

borg check --repair should be able to rebuild hints as well as the index (takes a while for a big repo though).

ginkel commented 1 week ago

Yes, this happens after check --repair:

# borgmatic check --progress --repair --force --repository hetzner

This is a potentially dangerous function.
check --repair might lead to data loss (for kinds of corruption it is not
capable of dealing with). BE VERY CAREFUL!

Type 'YES' if you understand this and want to continue: YES
Checking archives 9.5%

[...]

# borgmatic compact --repository hetzner --progress               
hetzner: Error running actions for repository
Command 'borg compact --progress ssh://<user>@<user>.your-storagebox.de:23/./borg/antares' returned non-zero exit status 2.
/etc/borgmatic/config.yaml: An error occurred

summary:
/etc/borgmatic/config.yaml: An error occurred
hetzner: Error running actions for repository
...
Remote: segment 20797 not found, but listed in compaction data
Remote: Compacting segments  55%
Remote: segment 20798 not found, but listed in compaction data
Remote: segment 20799 not found, but listed in compaction data
Remote: segment 20800 not found, but listed in compaction data
Remote: segment 20801 not found, but listed in compaction data
Remote: segment 20802 not found, but listed in compaction data
Remote: segment 20803 not found, but listed in compaction data
Remote: segment 20804 not found, but listed in compaction data
Remote: segment 20805 not found, but listed in compaction data
Remote: Compacting segments  56%
Traceback (most recent call last):
  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/remote.py", line 240, in serve
    res = f(**args)
          ^^^^^^^^^
  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 505, in commit
    self.compact_segments(threshold)
  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 880, in compact_segments
    assert segments[segment] == 0, 'Corrupted segment reference count - corrupted index or hints'
           ^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Corrupted segment reference count - corrupted index or hints
Platform: Linux antares.<domain> 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64
Linux: Unknown Linux
Borg: 1.4.0  Python: CPython 3.12.3 msgpack: 1.0.3 fuse: pyfuse3 3.3.0 [pyfuse3,llfuse]
PID: 3423298  CWD: /root
sys.argv: ['/usr/bin/borg', 'compact', '--progress', 'ssh://<user>@<user>.your-storagebox.de:23/./borg/antares']
SSH_ORIGINAL_COMMAND: None
Command 'borg compact --progress ssh://<user>@<user>.your-storagebox.de:23/./borg/antares' returned non-zero exit status 2.

Need some help? https://torsion.org/borgmatic/#issues
ginkel commented 1 week ago

@ThomasWaldmann Can you think of any workaround/mitigation or would I be better off creating a new repo?

ThomasWaldmann commented 1 week ago

Can you make a backup or snapshot of the repo and try just removing the hints file, running borg check --repair and then compact?

ginkel commented 1 week ago

Done. That seems to have worked, although the first check after deleting the hints was somewhat unhappy:

# borgmatic check --progress --repair --force --repository hetzner
Remote: Repository hints file missing or corrupted, trying to recover: [Errno 2] No such file or directory: '/home/borg/antares/hints.21839'                                         

Remote: Repository index missing or corrupted, trying to recover from: [Errno 2] No such file or directory: '/home/borg/antares/index.21839'                                         
Remote: Checking repository transaction due to previous error: [Errno 2] No such file or directory: '/home/borg/antares/index.21839'                                                 
Remote: Repository index missing or corrupted, trying to recover from: [Errno 2] No such file or directory: '/home/borg/antares/index.21839'                                         
Traceback (most recent call last):                                                                                                                                                   

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 578, in prepare_txn                                                         
    with IntegrityCheckedFile(hints_path, write=False, integrity_data=integrity_data) as fd:                                                                                         
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/crypto/file_integrity.py", line 129, in __init__                                                 
    self.file_fd = override_fd or open(path, mode)                                                                                                                                   
                                  ^^^^^^^^^^^^^^^^                                                                                                                                   

FileNotFoundError: [Errno 2] No such file or directory: '/home/borg/antares/hints.21839'                                                                                             

During handling of the above exception, another exception occurred:                                                                                                                  

Traceback (most recent call last):                                                                                                                                                   

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 528, in open_index
    with IntegrityCheckedFile(index_path, write=False, integrity_data=integrity_data) as fd:                                                                                         
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/crypto/file_integrity.py", line 129, in __init__
    self.file_fd = override_fd or open(path, mode)                                        
                                  ^^^^^^^^^^^^^^^^                                        

FileNotFoundError: [Errno 2] No such file or directory: '/home/borg/antares/index.21839'                                                                                             

During handling of the above exception, another exception occurred:                                                                                                                  

Traceback (most recent call last):                                                        

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 561, in prepare_txn
    self.index = self.open_index(transaction_id, auto_recover=False)                                                                                                                 
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                 

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 532, in open_index
    os.unlink(index_path)                    

FileNotFoundError: [Errno 2] No such file or directory: '/home/borg/antares/index.21839'                                                                                             

                                                                                                                                                                   12:56:25 [131/768]
During handling of the above exception, another exception occurred:                                                                                                                  

Traceback (most recent call last):                                                        

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 528, in open_index
    with IntegrityCheckedFile(index_path, write=False, integrity_data=integrity_data) as fd:                                                                                         
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/crypto/file_integrity.py", line 129, in __init__
    self.file_fd = override_fd or open(path, mode)                                        
                                  ^^^^^^^^^^^^^^^^                                        

FileNotFoundError: [Errno 2] No such file or directory: '/home/borg/antares/index.21839'                                                                                             

During handling of the above exception, another exception occurred:                                                                                                                  

Traceback (most recent call last):                                                        

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/remote.py", line 240, in serve                                                                   
    res = f(**args)                          
          ^^^^^^^^^                          

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 1207, in get
    self.index = self.open_index(self.get_transaction_id())                                                                                                                          
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                           

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 415, in get_transaction_id
    self.check_transaction()                                                              

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 412, in check_transaction
    self.replay_segments(replay_from, segments_transaction_id)                                                                                                                       

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 894, in replay_segments
    self.prepare_txn(index_transaction_id, do_cleanup=False)                                                                                                                         

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 587, in prepare_txn
    self.prepare_txn(transaction_id)                                                      

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 565, in prepare_txn
    self.index = self.open_index(transaction_id, auto_recover=False)                                                                                                                 
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                 

  File "/.3LcwXLX0gNhr7CrW/python-envs/borg-1.2.8/lib/python3.11/site-packages/borg/repository.py", line 532, in open_index
    os.unlink(index_path)                    

FileNotFoundError: [Errno 2] No such file or directory: '/home/borg/antares/index.21839'              

Platform: Linux antares.<domain> 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64
Linux: Unknown Linux                         
Borg: 1.4.0  Python: CPython 3.12.3 msgpack: 1.0.3 fuse: pyfuse3 3.3.0 [pyfuse3,llfuse]                                                                                              
PID: 3439628  CWD: /root                     
sys.argv: ['/usr/bin/borg', 'info', '--json', 'ssh://<user>@<user>.your-storagebox.de:23/./borg/antares']                                                                          
SSH_ORIGINAL_COMMAND: None                   

hetzner: Error running actions for repository                                             
Command '('borg', 'info', '--json', 'ssh://<user>@<user>.your-storagebox.de:23/./borg/antares')' returned non-zero exit status 2.
/etc/borgmatic/config.yaml: An error occurred                                             

summary:                                     
/etc/borgmatic/config.yaml: An error occurred                                             
hetzner: Error running actions for repository                                             
Command '('borg', 'info', '--json', 'ssh://<user>@<user>.your-storagebox.de:23/./borg/antares')' returned non-zero exit status 2.

Need some help? https://torsion.org/borgmatic/#issues                                     

# borgmatic check --progress --repair --force --repository hetzner                                                                                              
This is a potentially dangerous function.                                                 
check --repair might lead to data loss (for kinds of corruption it is not                                                                                                            
capable of dealing with). BE VERY CAREFUL!                                                

Type 'YES' if you understand this and want to continue: YES                                                                                                                          

# borgmatic compact --repository hetzner --progress 

Thanks for your help!

I still have the snapshot of the broken repo. If you'd like me to perform any forensics on that data, please let me know.

ThomasWaldmann commented 1 week ago

Hmm, wondering why it complains about the index? You said you only deleted the hints?

ginkel commented 1 week ago

Correct. There was a hints and index with a matching suffix nnnnn. I deleted the hints file. The index it complained about was a different number mmmmm > nnnnn that matched the new hints. In the end both files with matching suffixes mmmmm were present again.