borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
10.73k stars 733 forks source link

Getting "Data integrity error: Invalid segment entry size 0" on fresh repos #8233

Closed Mace68 closed 4 weeks ago

Mace68 commented 1 month ago

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

ISSUE, maybe BUG?

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.2.8

Operating system (distribution) and version.

KDE Neon 22.04 (Ubuntu-based)

Hardware / network configuration, and filesystems used.

PC wired to Synology NAS (ext4 FS, Synology Raid 5) through Netgear switch, using SMB protocol. Main system drive is ext4, attached mounted external drive is NTFS.

How much data is handled by borg?

~4 TB

Full borg commandline that lead to the problem (leave away excludes and passwords)

Archive script 1 (backup of main system drive):

#!/bin/sh

export BORG_REPO=/media/backup/Mace/BorgBackup

export BORG_PASSPHRASE='REDACTED'

NOW=$(date +"%FT%T")
HOSTNAME=$(hostname)
archive_name="${HOSTNAME}_${NOW}"

# some helpers and error handling:
info() { printf "\n%s %s\n\n" "$( date )" "$*" >&2; }
trap 'echo $( date ) Backup interrupted >&2; exit 2' INT TERM

echo "\n\n\n==============================================================================\n${NOW}\nBacking up /\nArchive: /media/backup/Mace/BorgBackup::${archive_name}\n=============================================================================="

borg create                         \
    --chunker-params fixed,4194304  \
    --filter AME                    \
    --stats                         \
    --show-rc                       \
    --compression auto,zstd,12      \
    --exclude-caches                \
    --one-file-system               \
    --exclude '/home/*/.cache/*'    \
    --exclude '/home/user/.firestorm_x64/cache/*' \
    --exclude '/home/user/.thunderbird/1shzhcpl.default-release/ImapMail/*' \
    --exclude '/home/user/Data/Firestorm/FSCache_Sound/*' \
    --exclude '/opt/ramdisk/*'      \
    --exclude '/opt/ramdisk.bak/*'  \
    --exclude '/var/cache/*'        \
    --exclude '/var/tmp/*'          \
    --exclude '/dev/*'              \
    --exclude '/proc/*'             \
    --exclude '/sys/*'              \
    --exclude '/tmp/*'              \
    --exclude '/run/*'              \
                                    \
    ::${archive_name} /             \
                                    \

backup_exit=$?

echo "\n----- Pruning repository -----"

borg prune                          \
    --list                          \
    --glob-archives '{hostname}_'   \
    --show-rc                       \
    --keep-daily    7               \
    --keep-weekly   4               \
    --keep-monthly  6               \

prune_exit=$?

# use highest exit code as global exit code
global_exit=$(( backup_exit > prune_exit ? backup_exit : prune_exit ))

if [ ${global_exit} -eq 0 ]; then
    echo "Backup and Prune finished successfully"
elif [ ${global_exit} -eq 1 ]; then
    echo "Backup and/or Prune finished with warnings"
else
    echo "Backup and/or Prune finished with errors"
fi

exit ${global_exit}

Archive script 2 (backup of fstab mounted external drive):

#!/bin/sh

export BORG_REPO=/media/backup/Mace/BorgBackup

export BORG_PASSPHRASE='REDACTED'

NOW=$(date +"%FT%T")
HOSTNAME=$(hostname)
archive_name="${HOSTNAME}-FreeAgent_${NOW}"

# some helpers and error handling:
info() { printf "\n%s %s\n\n" "$( date )" "$*" >&2; }
trap 'echo $( date ) Backup interrupted >&2; exit 2' INT TERM

echo "\n\n\n==============================================================================\n${NOW}\nBacking up /media/FreeAgent\nArchive: /media/backup/Mace/BorgBackup::${archive_name}\n=============================================================================="

borg create                                  \
    --chunker-params fixed,4194304           \
    --filter AME                             \
    --stats                                  \
    --show-rc                                \
    --compression auto,zstd,12               \
    --exclude-caches                         \
    --one-file-system                        \
    --exclude '/System Volume Information/*' \
    --exclude '/\$RECYCLE.BIN/*'             \
    --exclude '/Cache/*'                     \
                                             \
    ::${archive_name} /media/FreeAgent       \
                                             \

backup_exit=$?

echo "\n----- Pruning repository -----"

borg prune                                  \
    --list                                  \
    --glob-archives '{hostname}-FreeAgent_' \
    --show-rc                               \
    --keep-daily    7                       \
    --keep-weekly   4                       \
    --keep-monthly  6                       \

prune_exit=$?

# use highest exit code as global exit code
global_exit=$(( backup_exit > prune_exit ? backup_exit : prune_exit ))

if [ ${global_exit} -eq 0 ]; then
    echo "Backup and Prune finished successfully"
elif [ ${global_exit} -eq 1 ]; then
    echo "Backup and/or Prune finished with warnings"
else
    echo "Backup and/or Prune finished with errors"
fi

exit ${global_exit}

Describe the problem you're observing.

After starting a new repo twice with borg 1.2.8 I get the following error from both 'borg check --verbose' and 'borg check --verbose --verify-data', but not from 'borg check --verbose --archives-only' when verifying integrity of the initial archives in the repo (segment and offset differ each time):

Data integrity error: Invalid segment entry size 0 - too small [segment 17, offset 274793694]

The only warning during backups I observed both times was

/var/log/syslog: file changed while we backed it up

during execution of the Archive script 1 above (both times).

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Seems to happen every time I started a fresh repo

Include any warning/errors/backtraces from the system logs

My apologies, but I will need to know more specifically where/what to find the relevant requested info for this.

infectormp commented 1 month ago

When you say “fresh”, do you mean “empty” or have you already made one or more backups to the repo?

infectormp commented 1 month ago

Have you checked remote fs for corruption\consistency?

Mace68 commented 1 month ago

Empty, freshly initialized with

borg init --encryption repokey /media/backup/Mace/BorgBackup

after removing the BorgBackup directory so no existing files were present. I have not explicitly checked the Synology's fs yet, but current NAS status is "healthy". It does scrub monthly.

infectormp commented 1 month ago

Empty, freshly initialized with

I'm supposing that's the problem. Could you please check the repo containing some data?

Mace68 commented 1 month ago

Sorry I misread your question a little. I didn't run the check until both archives were successfully finished running their first backup runs.

infectormp commented 1 month ago

borg is very sensitive to the fs\disk health, have you checked smart or kernel logs for any disk errors?
Also try to reproduce this problem using a local file system to rule out possible network\SMB issues.

Mace68 commented 1 month ago

Thanks for the replies.

Apparently Synology DSM 6 only runs a fs check on startup, and is very tedious and bit risky to run manually/on-demand. I just looked and its uptime is 384 days so it probably hasn't checked its file system in over a year.

I'll create a test repo on my second nvme drive and backup my main system drive to it and borg check that one to make sure borg is working properly on my system. I'll reboot the NAS in the meantime, and if the local borg test succeeds I'll remove the archives from the NAS repo and redo the backups there, one at a time, running a borg check after each one.

I'll reply back if I need additional support.

ThomasWaldmann commented 1 month ago

This is an unusual effect and could be due to corruption on the (network) fs level.

Try to reproduce locally.

Mace68 commented 4 weeks ago

Seems it was indeed fs corruption on the NAS. The local test succeeded without any errors, and restarting the NAS (after over a year of uptime) so it could allegedly run fsck seemed to resolve the issue.

Thanks again for the replies, and big thanks for Borg!