Open Daniel15 opened 8 years ago
Yes, this is definitely something that I plan to work on. I found @wanderview's work after finishing sw-delta's: https://blog.wanderview.com/blog/2015/10/13/patching-resources-in-service-workers/
He built https://github.com/wanderview/ubsdiff, based on bsdiff. I'm not fluent in binary, but it looks pretty good.
The RFC 3229 standard also advertise vcdiff and diffe.
Here you can find a sw-delta-node branch, where I tried to implement your idea with a 2 bytes bitfield for [operation + offset + number of characters to remove/add].
Here is a small benchmark where I compare gzipped size with 1) the current format, 2) the binary format I just wrote, 3) the ubsdiff format
Files | current | binary | ubsdiff |
---|---|---|---|
jQuery-v2.1.2-min → jQuery-v2.1.3-min | 46 | 38 | 313 |
jQuery-v2.1.1-min → jQuery-v2.1.2-min | 1797 | 1689 | 1925 |
angular-v1.4.4 → angular-v1.4.5 | 3575 | 3656 | 4032 |
angular-v1.4.4-min → angular-v1.4.5-min | 5183 | 4880 | 5821 |
bootstrap-v3.3.4 → bootstrap-v3.3.5 | 1750 | 1851 | 2271 |
bootstrap-v3.3.4-min → bootstrap-v3.3.5-min | 1659 | 1753 | 2264 |
TOTAL | 14010 | 13867 | 16626 |
The binary format is not always smaller than the text format. In total, it's 1.02% smaller than the text format on this little benchmark.
We can safely eliminate ubsdiff, but I'm not sure if we should absolutely switch to the binary format, as the results is not as good as expected.
Actually I suspect the gzip compression to work pretty well with the current format, and to optimize the +/-/separator chars in a way that's nearly as good as a binary format. Gzip loves recurrent patterns.
Could you take a look at my implementation and tell me if you spot some possible improvements?
Actually I suspect the gzip compression to work pretty well with the current format
Ahh, good catch! I didn't consider the fact that gzip would compress the diff pretty well.
You could have both algorithms - binary and text - and use the one that results in the smallest size. That seems like it'd provide the best of both worlds.
You might achieve better file sizes by using a binary format. Instead of using numbers and
+
/-
for the metadata, have a bitfield for the combination of operation + offset + number of characters to remove - two bytes should be sufficient to encode all of that.If you also encode the number of characters being added in the same way you're encoding the number of characters to remove, you'd no longer need a separator between each segment, reducing the file size even more.