borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11k stars 741 forks source link

borg crashes when segment file corrupted, proposed correction #8249

Closed jbrepogmailcom closed 3 months ago

jbrepogmailcom commented 3 months ago

Hello. Some of my files got corrupted, they had reported different length than it actually was. Borg was not able to pick up on that error and was ending in crash:

File "/usr/lib/python3/dist-packages/borg/repository.py", line 1512, in _read data = fd.read(length) OSError: [Errno 5] Input/output error

I was not able to identify what file was causing this, but ChatGPT recommended a fix in routine _read, that prints what file causes the error just before it crashes. This helped me to identify bad segments and delete them. The modified "def _read" is here:

def _read(self, fd, fmt, header, segment, offset, acceptable_tags, read_data=True):
    # some code shared by read() and iter_objects()
    try:
        hdr_tuple = fmt.unpack(header)
    except struct.error as err:
        raise IntegrityError('Invalid segment entry header [segment {}, offset {}]: {}'.format(
            segment, offset, err)) from None
    if fmt is self.put_header_fmt:
        crc, size, tag, key = hdr_tuple
    elif fmt is self.header_fmt:
        crc, size, tag = hdr_tuple
        key = None
    else:
        raise TypeError("_read called with unsupported format")
    if size > MAX_OBJECT_SIZE:
        raise IntegrityError('Invalid segment entry size {} - too big [segment {}, offset {}]'.format(
            size, segment, offset))
    if size < fmt.size:
        raise IntegrityError('Invalid segment entry size {} - too small [segment {}, offset {}]'.format(
            size, segment, offset))
    length = size - fmt.size
    if read_data:
        try:
            data = fd.read(length)
            if len(data) != length:
                raise IntegrityError('Segment entry data short read [segment {}, offset {}]: expected {}, got {} bytes'.format(
                    segment, offset, length, len(data)))
        except OSError as e:
            logger.error('Error reading data from file: %s, segment: %s, offset: %s, length: %s', fd.name, segment, offset, length)
            raise
        if crc32(data, crc32(memoryview(header)[4:])) & 0xffffffff != crc:
            raise IntegrityError('Segment entry checksum mismatch [segment {}, offset {}]'.format(
                segment, offset))
        if key is None and tag in (TAG_PUT, TAG_DELETE):
            key, data = data[:32], data[32:]
    else:
        if key is None and tag in (TAG_PUT, TAG_DELETE):
            key = fd.read(32)
            length -= 32
            if len(key) != 32:
                raise IntegrityError('Segment entry key short read [segment {}, offset {}]: expected {}, got {} bytes'.format(
                    segment, offset, 32, len(key)))
        oldpos = fd.tell()
        seeked = fd.seek(length, os.SEEK_CUR) - oldpos
        data = None
        if seeked != length:
            raise IntegrityError('Segment entry data short seek [segment {}, offset {}]: expected {}, got {} bytes'.format(
                    segment, offset, length, seeked))
    if tag not in acceptable_tags:
        raise IntegrityError('Invalid segment entry header, did not get acceptable tag [segment {}, offset {}]'.format(
            segment, offset))
    return size, tag, key, data
ThomasWaldmann commented 3 months ago

I didn't verify that code (I could, but only if you show a DIFF), but IOError means that it is an error below borg:

So the real fix is to find and fix the root cause. Just deleting the corrupt files might be not enough.

jbrepogmailcom commented 3 months ago

Thanks for the comment. The error happened during backup from client to server over ssh, because closing file by borg and by disconnected network was not in sync. So it was low-level error, but caused by cutting off borg backup. There was no other way how to find defective segments than that modification in repository.py. For that reason, it may be useful.

You may not want to add it to official code, this is just debug help, but it may be useful to others if they encounter similar error.