Open GoogleCodeExporter opened 9 years ago
xdelta -d -s sourcefile delta | tee outputfile | md5sum
Original comment by nicolas....@gmail.com
on 26 Dec 2007 at 8:07
A couple of points,
1. I am running xdelta as a process run from a c# windows service and I think
piping
wont work (I could be wrong).
2. runing the above command would mean that the resuling file gets read after
the
file has been written (again I could be wrong), this would be extra I/O.
my suggestion is that the md5 checksum is created during the process to create
the
result file meaning a single read of the data (e.g. the data that is being
streamed
into the outputfile) this reduces disk I/O significantly.
I will give the above a go to see what happens.
Thanks
Original comment by a...@intralan.co.uk
on 27 Dec 2007 at 8:12
It's not an unreasonable request, although it would be better if there was a
way for
xdelta to automatically verify the MD5. The problem is there is no
currently-standardized method to embed the MD5sum at the end of the file
encoding.
xdelta3 does verify the adler32 checksum of each window. If you know the length
matches and all of the windows' adler32 checksums match, you can be reasonably
sure
the file contents are correct. Is this sufficient?
Original comment by josh.mac...@gmail.com
on 27 Dec 2007 at 7:52
my suggestion comes from the fact that if you are creating the resulting
outputfile,
there is no overhead in disk I/O to be generating an md5 checksum whilst the
file is
being streamed to disk. I am guessing that most people that use this type of
patching verify that the output is indeed perfect, I must say I have not yet
found a
single failure, but this does not mean it wont happen.
Personally I log the original checksum, the checksum that would be created if
the
file was patched, this double checks the process has worked perfectly. I will
look
at moving over to using adler32 for the checksums as this seems better in terms
of
performance over md5.
Maybe the other way to do this is store the outputfile's checksum in the vcdiff
file, so that auto checking could happen.
Original comment by a...@intralan.co.uk
on 28 Dec 2007 at 6:50
The "tee" solution does not involve extra disk-IO. That said, I agree with you
in
principle.
The problem with your other suggestion, to store the outputfile's checksum in
the
vcdiff file, is that vcdiff doesn't support such an annotation. In fact, I had
to
petition the vcdiff designer to add adler32 support--md5 is considered very
expensive.
For the encoder to add the MD5 checksum, it needs to be added at the end of the
vcdiff encoding. I will pass this idea around. (I think application-specific
per-window metadata is generally useful.)
As for the decoder outputting the MD5 checksum, it's reasonable, but I don't
think I
can justify it unless the encoder is also storing the checksum at the end of the
encoding. I'll think about this support, but I want to remain part of the
VCDIFF
standard and something needs to be added for this to work.
For now, I recommend the "tee" solution.
Original comment by josh.mac...@gmail.com
on 28 Dec 2007 at 7:11
I will look into the "tee" solution, thanks
Original comment by a...@intralan.co.uk
on 28 Dec 2007 at 7:14
Done some testing with Adler32, faster than md5 and the checksum is smaller,
909mb file
Adler32 took 11.8274956733328 seconds, checksum = 503813208
MD5 took 13.1282727273997 seconds, checksum = c469bb38bfd6937f1a868511b2d63ee4
so approx 10% speed impovement and 66% saving in the size of the checksum
is it posible to gather the adler32 checksum during the processing of xdelta on
either the encode or decode and output them to the console, from what I read if
you
pipe the adler32 result of the previous window into the next checksum
calculation
you can produce a checksum for the whole file. Again no extra disk I/O.
Original comment by a...@intralan.co.uk
on 28 Dec 2007 at 10:49
Adler32 is a "weaker" checksum, the same used by gzip.
However, xdelta is computing it for each window, not for the entire file.
To compute the entire-file checksum would double the cost, and at that point i
think
it would be preferrable to use MD5. As I mentioned, I would like to share the
idea
with others interested in VCDIFF development to see if we can find a solution,
because I'd like to recover the xdelta-1.x feature of encoding the MD5.
Original comment by josh.mac...@gmail.com
on 28 Dec 2007 at 4:38
Original issue reported on code.google.com by
a...@intralan.co.uk
on 22 Dec 2007 at 5:56