Closed seambeat closed 8 years ago
I think the culprit is to be found in _chunker.c. There are two code paths, one works on native FDs, one uses the Python file-like interface[1].
// chunker_fill
// if we have a os-level file descriptor, use os-level API
n = read(c->fh, c->data + c->position + c->remaining, n);
if(n > 0) {
c->remaining += n;
c->bytes_read += n;
}
else
if(n == 0) {
c->eof = 1;
}
else {
// some error happened
return 0;
}
I never seen what read() returns on bad sectors, but I suppose EIO. So chunker_fill returns 0 on error, chunker_process, which uses chunker_fill, checks that case and returns NULL (for a PyObject return value -- I suppose this is converted to None). So this NULL/None should bubble up to borg.archive.Archive.process_file, line 555:
with os.fdopen(fh, 'rb') as fd:
chunks = []
for chunk in self.chunker.chunkify(fd, fh):
chunks.append(cache.add_chunk(self.key.id_hash(chunk), chunk, self.stats))
if self.show_progress:
self.stats.show_progress(item=item, dt=0.2)
cache.memorize_file(path_hash, st, [c[0] for c in chunks])
status = status or 'M' # regular file, modified (if not 'A' already)
This should explode if chunk where None. So I guess something doesn't quite work out here. Going back a bit, what does happen with the NULL PyObject*?
[1]
// no os-level file descriptor, use Python file object API
data = PyObject_CallMethod(c->fd, "read", "i", n);
if(!data) {
return 0;
}
If read raises an exception it should bubble up the stack just like any other exception.
i suspect to make a reliable test on that one a fuse filesystem that simulates errors is needed
Probably also the best way to debug it, if the cause is not obvious / as suspected above.
Here's a list for I/O fault injection: http://unix.stackexchange.com/a/144200/32100
Using a fault-simulating Python file-like object wouldn't cover both paths in _chunker.c
This should explode if chunk where None. So I guess something doesn't quite work out here. Going back a bit, what does happen with the NULL PyObject*?
Seems like "nothing, really". No error, nothing happens. A for chunk in …chunkify
for loop just exits cleanly.
When testing this with a fault-injecting Python file-like object, the error handling works, because the OSError from the read will propagate normally within the Python interpreter.
So in conclusion I'd say that in _chunker.c at line 154:
else {
// some error happened
return 0;
}
We really want this:
else {
// some error happened
PyErr_SetFromErrno(PyExc_OSError);
return 0;
}
Which will have the normal OSError exception propagate as usual. I didn't have time to actually test this with a fault-injecting FS/blockdev.
So if someone already got that up and running, please try if the attached commit fixes the issue for good: I just edited chunker_process to simulate the response that would be caused by chunker_fill.
I have just tested your commit https://github.com/enkore/borg/commit/fbc4c0cee027781974656cb377bbf24138434ec2 ... looks good.
For the record:
I created a block device, put a file system on it and generated a few files.
$ ls -l mnt
total 488784
-rw-r--r-- 1 seambeat seambeat 104857600 Mar 14 22:44 file-1
-rw-r--r-- 1 seambeat seambeat 104857600 Mar 14 22:44 file-2
-rw-r--r-- 1 seambeat seambeat 104857600 Mar 14 22:44 file-3
-rw-r--r-- 1 seambeat seambeat 104857600 Mar 14 22:44 file-4
-rw-r--r-- 1 seambeat seambeat 81068032 Mar 14 22:45 file-5
drwx------ 2 seambeat seambeat 16384 Mar 14 22:44 lost+found
Then I laid an error injecting block device over it and tried to read the files (just generated md5 checksums). The file-4
is demaged.
$ md5sum -b mnt/file-*
d444bf940528e7ddba55ba2bf4c6f862 *mnt/file-1
7c47c121efc42d87ff6e276f174cefb0 *mnt/file-2
040da7f12a26227913c6e8dcefbd4095 *mnt/file-3
md5sum: mnt/file-4: Input/output error
8d5933613405c96de2f3e8cbd258157d *mnt/file-5
Next, with Borg 1.0.0 I had recently installed from prebuilt binaries, I initialized a new repository and put the files in a new archive.
$ borg --version
borg 1.0.0
$ borg init -v --show-rc --encryption none repo
Initializing repository at "repo"
Encryption NOT enabled.
Use the "--encryption=repokey|keyfile" to enable encryption.
Synchronizing chunks cache...
Archives: 0, w/ cached Idx: 0, w/ outdated Idx: 0, w/o cached Idx: 0.
Done.
terminating with success status, rc 0
$ borg create -v --show-rc --list --stats repo::archive-1 mnt
reading files cache
processing files
A mnt/file-1
A mnt/file-2
A mnt/file-3
A mnt/file-4
A mnt/file-5
d mnt/lost+found
d mnt
------------------------------------------------------------------------------
Archive name: archive-1
Archive fingerprint: 12e930cd76c0b843288c4195cf23bc4fdde5fb9183c5fee770322f0897fdb150
Time (start): Tue, 2016-03-15 01:05:24
Time (end): Tue, 2016-03-15 01:05:41
Duration: 17.52 seconds
Number of files: 5
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 469.60 MB 469.60 MB 469.60 MB
All archives: 469.60 MB 469.60 MB 469.60 MB
Unique chunks Total chunks
Chunk index: 196 196
------------------------------------------------------------------------------
terminating with success status, rc 0
$ borg list -v --show-rc repo::archive-1
drwxr-xr-x seambeat seambeat 0 Mon, 2016-03-14 23:23:07 mnt
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:32 mnt/file-1
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:41 mnt/file-2
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:50 mnt/file-3
-rw-r--r-- seambeat seambeat 73952974 Mon, 2016-03-14 22:44:58 mnt/file-4
-rw-r--r-- seambeat seambeat 81068032 Mon, 2016-03-14 22:45:06 mnt/file-5
drwx------ seambeat seambeat 0 Mon, 2016-03-14 22:44:23 mnt/lost+found
terminating with success status, rc 0
Borg put the files into the archive without any complain. However, the file-4
is truncated (compare it with the directory listing above).
I repeated the whole procedure with Borg from your GitHub repository with the commit https://github.com/enkore/borg/commit/fbc4c0cee027781974656cb377bbf24138434ec2 applied.
$ python --version
Python 3.5.1
$ borg --version
borg 1.0.1.dev13+ngfbc4c0c
$ borg init -v --show-rc --encryption none repo
Initializing repository at "repo"
Encryption NOT enabled.
Use the "--encryption=repokey|keyfile" to enable encryption.
Synchronizing chunks cache...
Archives: 0, w/ cached Idx: 0, w/ outdated Idx: 0, w/o cached Idx: 0.
Done.
terminating with success status, rc 0
$ borg create -v --show-rc --list --stats repo::archive-1 mnt
reading files cache
processing files
A mnt/file-1
A mnt/file-2
A mnt/file-3
mnt/file-4: [Errno 5] Input/output error
E mnt/file-4
A mnt/file-5
d mnt/lost+found
d mnt
------------------------------------------------------------------------------
Archive name: archive-1
Archive fingerprint: fc5ef04ab34f27a6acdf943c59b26c204a0ccb9649a26bdb584541777c25fe6f
Time (start): Tue, 2016-03-15 01:23:12
Time (end): Tue, 2016-03-15 01:23:29
Duration: 17.42 seconds
Number of files: 4
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 469.60 MB 469.60 MB 469.60 MB
All archives: 469.60 MB 469.60 MB 469.60 MB
Unique chunks Total chunks
Chunk index: 196 196
------------------------------------------------------------------------------
terminating with warning status, rc 1
$ borg list -v --show-rc repo::archive-1
drwxr-xr-x seambeat seambeat 0 Mon, 2016-03-14 23:23:07 mnt
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:32 mnt/file-1
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:41 mnt/file-2
-rw-r--r-- seambeat seambeat 104857600 Mon, 2016-03-14 22:44:50 mnt/file-3
-rw-r--r-- seambeat seambeat 81068032 Mon, 2016-03-14 22:45:06 mnt/file-5
drwx------ seambeat seambeat 0 Mon, 2016-03-14 22:44:23 mnt/lost+found
terminating with success status, rc 0
This time Borg made it right. It noticed the I/O error, discarded the file-4
and returned correct return code 1
.
Thank you for confirming this. I opened a PR for the fix.
@seambeat / @enkore thanks for finding, fixing and testing this!
https://github.com/borgbackup/borg/pull/749
now also fixed in 1.0-maint and 0.xx-maint.
When creating a new archive with files read from a disk with some bad sectors, Borg truncates files that are not fully readable and includes them into the archive. There is no error message or a warning, the return code is 0.
Seems like Borg stops reading files on an I/O error and continues as if they were fully and correctly read. The result is an "successfully" created archive containing demaged files.
The same behavior can be reproduced using a fake block device instead of a disk with device mapper's error target on it.