Closed RubenKelevra closed 2 years ago
Interesting concept, but not sure that it'll be implemented or until it's more widely used. The more I read about it the more it seems that was killed off because no official repos used it. Thanks for posting, looking forward to reading others' views. https://www.reddit.com/r/archlinux/comments/b7zkg5/why_delta_support_removed_from_pacman/
We have packages like mesa-*-git
that update every hour, others that don't do so much, but their diff is quite big.
If we save like "a month-old" of delta packages, we would need at least four times the amount of space we're using now. For our main node that wouldn't be a problem but most of our mirrors are very low-tier VPS running on small SSDs.
@Technetium1 it was killed because there were security issues with the implementation. A forked database file (which isn't signed) could install an arbitrary package via a delta patch and run commands as root.
My idea includes an additional signature for the unpacked archive, so there are effectively two signature files. And after the patch would be applied, the second signature would confirm that the patch is valid.
This allows to create patches on demand or automated on a repository server without bothering the maintainer of the packages with their creation.
Zstd patches are also EXTREMELY fast compared to the previous approach, so even on older computers it's viable to use them if you don't have at least 100 MBit/s download speed.
You can test out the efficiency yourself:
zstd --patch-from=package-old_version.pkg.tar package-new_version.pkg.tar package-old_version_to_new_version.pkg.tar.zst_delta
to create it.
Then compare it to the size of package-new_version.pkg.tar.zst
To apply the patch decompress the package-old_version.pkg.tar.zst on a different machine and fetch the old_version_to_new_version.pkg.tar.zst_delta.
Then run
zstd -d --patch-from=package-old_version.pkg.tar old_version_to_new_version.pkg.tar.zst_delta -o package-new_version.pkg.tar
@PedroHLC wrote:
We have packages like
mesa-*-git
that update every hour, others that don't do so much, but their diff is quite big.
I've tested this:
mesa-tkg-git-22.2.0_devel.153443.a7f44b62694-1-x86_64.pkg.tar
to mesa-tkg-git-22.2.0_devel.153445.d2ab0ed31e1-1-x86_64.pkg.tar
would be 3.8M and mesa-tkg-git-22.2.0_devel.153445.d2ab0ed31e1-1-x86_64.pkg.tar
to mesa-tkg-git-22.2.0_devel.153609.2b28668d1da-1-x86_64.pkg.tar
would be 22M.
So that's a saving of 92.8% and 58.5%.
And yes, that's maybe not something every mirror wants to store. But on the other hand, that's fully optional – you can just create two tiers.
I'm for example happy to store 2-3 days worth of deltas on my mirrors, if this allows users to install updates more often or faster.
And yes, that's maybe not something every mirror wants to store. But on the other hand, that's fully optional – you can just create two tiers.
Good idea!
Zstd patches are also EXTREMELY fast compared to the previous approach
I didn't know zstd had this feature aboard.
I had a talk with @jonathonf on telegram in the past discussing zsync2 based delta downloads for packages. Zsync2 has the advantage of being able to do delta updates between any 2 files without a delta file, but the disadvantage is that it has to scan the original file before it is able to start a delta download. This would definitely be the lower effort approach to take for us, but I doubt the client speed numbers would be the same. This also uses multiple HTTP range requests instead of just applying a simple single delta file, so this also suffers in the download speed department too.
I doubt this is is anyone's priority list to implement, so let me close this. In case someone is interested feel free to open a PR.
No idea how to do this but I would be interested into this being implemented
Hey guys,
I did some testing on how efficient delta updates (if implement) would be in pacman. And since you're having one of the biggest packages around, I thought you might be interesting to take a look at the findings and take part in the discussion:
https://lists.archlinux.org/pipermail/pacman-dev/2022-May/025568.html
The numbers are super promising, with an average saving of 40% on "source code heavy" packages and sometimes above 99% for data heavy packages with applying times below 1 second on modern computers.
Best regards,
Ruben