borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.23k stars 743 forks source link

Borg silently skips disk I/O errors and includes truncated files into archive #748

Closed seambeat closed 8 years ago

seambeat commented 8 years ago

When creating a new archive with files read from a disk with some bad sectors, Borg truncates files that are not fully readable and includes them into the archive. There is no error message or a warning, the return code is 0.

Seems like Borg stops reading files on an I/O error and continues as if they were fully and correctly read. The result is an "successfully" created archive containing demaged files.

The same behavior can be reproduced using a fake block device instead of a disk with device mapper's error target on it.

enkore commented 8 years ago

I think the culprit is to be found in _chunker.c. There are two code paths, one works on native FDs, one uses the Python file-like interface[1].

    // chunker_fill
    // if we have a os-level file descriptor, use os-level API
    n = read(c->fh, c->data + c->position + c->remaining, n);
    if(n > 0) {
        c->remaining += n;
        c->bytes_read += n;
    }
    else
    if(n == 0) {
        c->eof = 1;
    }
    else {
        // some error happened
        return 0;
    }

I never seen what read() returns on bad sectors, but I suppose EIO. So chunker_fill returns 0 on error, chunker_process, which uses chunker_fill, checks that case and returns NULL (for a PyObject return value -- I suppose this is converted to None). So this NULL/None should bubble up to borg.archive.Archive.process_file, line 555:

        with os.fdopen(fh, 'rb') as fd:
            chunks = []
            for chunk in self.chunker.chunkify(fd, fh):
                chunks.append(cache.add_chunk(self.key.id_hash(chunk), chunk, self.stats))
                if self.show_progress:
                    self.stats.show_progress(item=item, dt=0.2)
        cache.memorize_file(path_hash, st, [c[0] for c in chunks])
        status = status or 'M'  # regular file, modified (if not 'A' already)

This should explode if chunk where None. So I guess something doesn't quite work out here. Going back a bit, what does happen with the NULL PyObject*?

[1]

    // no os-level file descriptor, use Python file object API
    data = PyObject_CallMethod(c->fd, "read", "i", n);
    if(!data) {
        return 0;
    }

If read raises an exception it should bubble up the stack just like any other exception.

RonnyPfannschmidt commented 8 years ago

i suspect to make a reliable test on that one a fuse filesystem that simulates errors is needed

enkore commented 8 years ago

Probably also the best way to debug it, if the cause is not obvious / as suspected above.

Here's a list for I/O fault injection: http://unix.stackexchange.com/a/144200/32100

Using a fault-simulating Python file-like object wouldn't cover both paths in _chunker.c

seambeat commented 8 years ago

I used this approach to simulate a disk with bad sectors.

enkore commented 8 years ago

This should explode if chunk where None. So I guess something doesn't quite work out here. Going back a bit, what does happen with the NULL PyObject*?

Seems like "nothing, really". No error, nothing happens. A for chunk in …chunkify for loop just exits cleanly.

When testing this with a fault-injecting Python file-like object, the error handling works, because the OSError from the read will propagate normally within the Python interpreter.

So in conclusion I'd say that in _chunker.c at line 154:

    else {
        // some error happened
        return 0;
    }

We really want this:

    else {
        // some error happened
        PyErr_SetFromErrno(PyExc_OSError);
        return 0;
    }

Which will have the normal OSError exception propagate as usual. I didn't have time to actually test this with a fault-injecting FS/blockdev.

So if someone already got that up and running, please try if the attached commit fixes the issue for good: I just edited chunker_process to simulate the response that would be caused by chunker_fill.

seambeat commented 8 years ago

I have just tested your commit https://github.com/enkore/borg/commit/fbc4c0cee027781974656cb377bbf24138434ec2 ... looks good.

For the record:

I created a block device, put a file system on it and generated a few files.

$ ls -l mnt
total 488784
-rw-r--r-- 1 seambeat seambeat 104857600 Mar 14 22:44 file-1
-rw-r--r-- 1 seambeat seambeat 104857600 Mar 14 22:44 file-2
-rw-r--r-- 1 seambeat seambeat 104857600 Mar 14 22:44 file-3
-rw-r--r-- 1 seambeat seambeat 104857600 Mar 14 22:44 file-4
-rw-r--r-- 1 seambeat seambeat  81068032 Mar 14 22:45 file-5
drwx------ 2 seambeat seambeat     16384 Mar 14 22:44 lost+found

Then I laid an error injecting block device over it and tried to read the files (just generated md5 checksums). The file-4 is demaged.

$ md5sum -b mnt/file-*
d444bf940528e7ddba55ba2bf4c6f862 *mnt/file-1
7c47c121efc42d87ff6e276f174cefb0 *mnt/file-2
040da7f12a26227913c6e8dcefbd4095 *mnt/file-3
md5sum: mnt/file-4: Input/output error
8d5933613405c96de2f3e8cbd258157d *mnt/file-5

Next, with Borg 1.0.0 I had recently installed from prebuilt binaries, I initialized a new repository and put the files in a new archive.

$ borg --version
borg 1.0.0

$ borg init -v --show-rc --encryption none repo
Initializing repository at "repo"
Encryption NOT enabled.
Use the "--encryption=repokey|keyfile" to enable encryption.
Synchronizing chunks cache...
Archives: 0, w/ cached Idx: 0, w/ outdated Idx: 0, w/o cached Idx: 0.
Done.
terminating with success status, rc 0

$ borg create -v --show-rc --list --stats repo::archive-1 mnt
reading files cache
processing files
A mnt/file-1
A mnt/file-2
A mnt/file-3
A mnt/file-4
A mnt/file-5
d mnt/lost+found
d mnt
------------------------------------------------------------------------------
Archive name: archive-1
Archive fingerprint: 12e930cd76c0b843288c4195cf23bc4fdde5fb9183c5fee770322f0897fdb150
Time (start): Tue, 2016-03-15 01:05:24
Time (end):   Tue, 2016-03-15 01:05:41
Duration: 17.52 seconds
Number of files: 5
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              469.60 MB            469.60 MB            469.60 MB
All archives:              469.60 MB            469.60 MB            469.60 MB

                       Unique chunks         Total chunks
Chunk index:                     196                  196
------------------------------------------------------------------------------
terminating with success status, rc 0

$ borg list -v --show-rc repo::archive-1
drwxr-xr-x seambeat seambeat        0 Mon, 2016-03-14 23:23:07 mnt
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:32 mnt/file-1
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:41 mnt/file-2
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:50 mnt/file-3
-rw-r--r-- seambeat seambeat 73952974 Mon, 2016-03-14 22:44:58 mnt/file-4
-rw-r--r-- seambeat seambeat 81068032 Mon, 2016-03-14 22:45:06 mnt/file-5
drwx------ seambeat seambeat        0 Mon, 2016-03-14 22:44:23 mnt/lost+found
terminating with success status, rc 0

Borg put the files into the archive without any complain. However, the file-4 is truncated (compare it with the directory listing above).

I repeated the whole procedure with Borg from your GitHub repository with the commit https://github.com/enkore/borg/commit/fbc4c0cee027781974656cb377bbf24138434ec2 applied.

$ python --version
Python 3.5.1

$ borg --version
borg 1.0.1.dev13+ngfbc4c0c

$ borg init -v --show-rc --encryption none repo
Initializing repository at "repo"
Encryption NOT enabled.
Use the "--encryption=repokey|keyfile" to enable encryption.
Synchronizing chunks cache...
Archives: 0, w/ cached Idx: 0, w/ outdated Idx: 0, w/o cached Idx: 0.
Done.
terminating with success status, rc 0

$ borg create -v --show-rc --list --stats repo::archive-1 mnt
reading files cache
processing files
A mnt/file-1
A mnt/file-2
A mnt/file-3
mnt/file-4: [Errno 5] Input/output error
E mnt/file-4
A mnt/file-5
d mnt/lost+found
d mnt
------------------------------------------------------------------------------
Archive name: archive-1
Archive fingerprint: fc5ef04ab34f27a6acdf943c59b26c204a0ccb9649a26bdb584541777c25fe6f
Time (start): Tue, 2016-03-15 01:23:12
Time (end):   Tue, 2016-03-15 01:23:29
Duration: 17.42 seconds
Number of files: 4
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              469.60 MB            469.60 MB            469.60 MB
All archives:              469.60 MB            469.60 MB            469.60 MB

                       Unique chunks         Total chunks
Chunk index:                     196                  196
------------------------------------------------------------------------------
terminating with warning status, rc 1

$ borg list -v --show-rc repo::archive-1
drwxr-xr-x seambeat seambeat        0 Mon, 2016-03-14 23:23:07 mnt
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:32 mnt/file-1
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:41 mnt/file-2
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:50 mnt/file-3
-rw-r--r-- seambeat seambeat 81068032 Mon, 2016-03-14 22:45:06 mnt/file-5
drwx------ seambeat seambeat        0 Mon, 2016-03-14 22:44:23 mnt/lost+found
terminating with success status, rc 0

This time Borg made it right. It noticed the I/O error, discarded the file-4 and returned correct return code 1.

enkore commented 8 years ago

Thank you for confirming this. I opened a PR for the fix.

ThomasWaldmann commented 8 years ago

@seambeat / @enkore thanks for finding, fixing and testing this!

ThomasWaldmann commented 8 years ago

https://github.com/borgbackup/borg/pull/749

ThomasWaldmann commented 8 years ago

now also fixed in 1.0-maint and 0.xx-maint.