Open GoogleCodeExporter opened 9 years ago
This is a good request.
It is currently difficult for 2 reasons:
1) The file order is deliberately random
2) The encryption key is deliberately random.
Fixing (1) can perhaps be done by extending the listing of file content to also
report what volume the file was found in, and then attempt to use that order
instead of the random one.
Fixing (2) is a bit harder, as that would require that the individual file key
can be recovered in some way, without compromising it. It could perhaps be as
simple as storing the file keys in the manifest or a similar complementary
file, which is then encrypted. The current code does not easily allow for
getting/setting this key however. Since the encryption uses chaining, a single
byte change causes a cascading random effect through the rest of the file. That
is at least for AES, I'm not sure if GPG uses a file key, or if such the key is
even accessible.
There is also a problem with the compression which may produce large changes
when encountering small ones. I'm not really sure how it can be done if a file
is resized and potentially spanning multiple volumes.
The only solution I can think of is to mimic a filesystem inside the archives,
so there is complete control over what blocks go where, allowing parts of
growing files to be appended and leaving space for files that shrink. But this
does not work well with compression as that tends to leave blocks of differing
sizes, and performing the compression on the resulting file would potentially
cause large changes.
I see the point with this request, but I have no clear view on how to fix this
without designing a new system with this particular issue as the focus point.
I'll leave the request as accepted, but I will not actively work on it for the
time being.
Original comment by kenneth....@gmail.com
on 20 Sep 2010 at 1:28
I do not know the inner workings of Duplicati so I will take the liberty of
thinking aloud here…
If encryption key is random then I would assume the key to be stored somewhere
in the backup to allow restore (most likely encrypted using the key entered
when backup was defined). Since the files are recoverable using the key stored
in the Duplicati config, I would not see any harm for the encryption ley to be
stored in the config as well(??) and reused during next full backup.
For the full backups couldn’t a compromise be to sway a bit away from the
principle of fixed sized volumes? AES has a behaviour (ie. CBC) that literally
breaks the idea of RSync; could the individual file be broken down to (user?)
defined blocks (for instance of 2K, 8K, 64K size or …) before compression /
encryption? Each volume could then contain a (user?) defined number of blocks.
Each new file would mean the start of a new volume. Volume names could include
a sequence number of the file and number of the first block in the volume. In
case of a file is altered between two full backups only the affected block(s)
and thus the corresponding volume(s) are changed. That should also simply
shrink / growth scenarios. Incremental backups could apply the existing
creation methodology (and come to think about it even existing naming standard).
Original comment by Henrik68@gmail.com
on 20 Sep 2010 at 7:01
Yes, the file keys could be stored twice, which would not affect security as
they are encrypted using the backup password, good point.
The CBC method is the chaining method I mentioned. The method of encrypting
individual blocks can be done, but must be done carefully to avoid introducing
an ECB (Electronic Code Book) like setup.
So the plan would be:
1) Define a file system, so that file contents can be written as a continuous
stream, eg: write file 1, n1 bytes, write file 2, n2 bytes, possibly aligned
with a block size. Trade-of between size and minor change adaptation.
2) When making a new full backup, use the same layout, potentially leaving
holes for shrunken files, and creating new blocks (at the end) for files that
have grown (can also re-use holes).
3) Apply custom encryption on block-by-block basis, re-using the keys from a
previous backup. Not necessarily the same block size as in (1).
4) Compression?
Compression must be performed before encryption as encrypted data is not
compressible. If one were to apply compression to the entire stream, it may
compress differently and thus cause unwanted cascading changes. If one were to
apply compression to fixed blocks, the result would be blocks of differing
sizes. Which leaves the option that files must be compressed individually
before being applied to step (1) above. Perhaps a special compression
algorithm can be used to reduce the amount of changes within a compressed file.
It may require a large amount of temporary space as a file can become
fragmented between many volumes and thus require that all files are downloaded
before a restore can be completed. There is also an overhead with storing the
archive layout information. And files with many holes are space inefficient.
The worst part is the compression, which can potentially break everything. I
don't know any good compression algorithms that can handle this. The gzip
program has the --rsyncable option, but I think that the benefits of this
particular method will be destroyed by encryption, as the compressed output is
not synchronized with the block structure used for the encryption.
The only thing I can think of that would solve this is to make the compression
and encryption interleaved, so the compression output blocks are encrypted
independently. If the compression can somehow synchronize the output, like with
the --rsyncable option, it may be possible to achieve this. In this setup, the
file format can actually be something similar to just appending the encrypted
file blocks after one another. Since the outputs are independent, the outputs
should contain matching sequences, which will be detected by rsync.
Since rsync is file based, a volume cannot overflow into another volume. If the
volume size limit is not applied, this will not be a problem, but may generate
disproportional volume sizes. This could be handled by using overflow volumes
that keep the overflowed data, but may cause large a increase in the number of
files instead.
It sounds more and more like a new file format :).
Perhaps something like this already exists?
Original comment by kenneth....@gmail.com
on 21 Sep 2010 at 11:22
I agree. Rsync protocol support would highly be appreciated.
Original comment by michael....@gmail.com
on 13 Aug 2012 at 8:55
Original issue reported on code.google.com by
Henrik68@gmail.com
on 18 Sep 2010 at 11:56