Open cespare opened 1 year ago
CC @golang/release
Those are pretty compelling numbers. At least on my machine, with tar 1.34, tar -xf
works just as well on .tar.zst
, so I don't see any downsides to doing this other than some UI clutter on go.dev/dl.
The implementation detail is not so trivial. Creating release archives is now the responsibility of https://cs.opensource.google/go/go/+/master:src/cmd/distpack/pack.go, and we want them to be completely deterministic, which means using a compression algorithm that we can hold constant for the lifetime of a Go release. (See the associated blog post). We'd need to pull a zstd implementation into the distribution, either as a standard library package (unlikely), an internal package we own (time-consuming to write, unless someone wants to contribute it), or vendor something that looks solid (seems fine?).
Overall I'm in favor of this, it seems like a moderate amount of effort and pretty much a pure win for users.
We'd need to pull a zstd implementation into the distribution, either as a standard library package (unlikely), an internal package we own
Alternatively, rather than freezing it at the Go package layer, you could rely on os/exec
, and freeze it at the binary level of which zstd
(or zopfli
for #62445) binary you use.
@heschi Just a note that there is a package that we could vendor if we go that route: github.com/klauspost/compress/zstd.
FWIW github.com/klauspost/compress/zstd
compresses it to 43873902 bytes with the best compression setting. That is 43.87MB in ~8.3s.
But to be fair it does have a bigger window size. Without the same it is 49.83MB - but there isn't too much reason to have the small window, if you are that resource constrained just use gz.
As Heschi notes, the relevant code needs to live or be vendored into the Go tree so that we can reproduce the archives bit-for-bit even far into the future. We could do that, but it increases the cost. Shelling out to a separate tool that isn't versioned in the Go repo is not an option. We'd also have to update gorebuild to verify zstd as well.
In the long term we may end up with zstd vendored anyway, or perhaps even added to the standard library. I'm OK with vendoring it for use in cmd/dist.
That said, it will require work on the release team's part, and we may not have bandwidth for reviewing and deploying such a change in the near future. But in the abstract it sounds reasonable to me.
If someone's interested in moving this forward, I think the steps are to vendor a zstd implementation, add support to cmd/distpack
, and update our release automation to also publish the new files. If someone does the first two pieces I think the release team can find the time to do the latter.
There are two other kinds of artifacts not covered by this proposal: Windows distribution archives and toolchain module files, both .zip files. Wikipedia says that zip standardized zstd support a few years ago, so it's theoretically possible to make this change to both.
For Windows, it would be interesting to survey implementations and see how usable a more advanced compression would be.
For the toolchain module files, we'd need to teach the Go command to understand them, and (per discussion with Russ) probably start publishing a second series of archives, v0.0.2
rather than v0.0.1
. Since toolchain upgrades will increasingly be done via the Go command, these are arguably the most important to optimize. But perhaps we should start by getting experience with the release archives.
Wikipedia says that zip standardized zstd support a few years ago, so it's theoretically possible to make this change to both.
Yeah; No. Using the Windows 11 built-in extraction tool s with zstd in a ZIP file just gives an Error 0x80004005: Unspecified error
. 90% of users will use that for extraction.
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
Are there any objections to adding this?
Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group
Add .tar.zst archives anywhere we generate .tar.gz archives in cmd/distpack. We would not add zstd-enabled zip files because windows zip readers can’t handle them.
In the longer term, this could be a step toward zstd-compressed modules, but that would require changing many more moving parts and is not in scope for this specific proposal.
No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group
Add .tar.zst archives anywhere we generate .tar.gz archives in cmd/distpack. We would not add zstd-enabled zip files because windows zip readers can’t handle them.
In the longer term, this could be a step toward zstd-compressed modules, but that would require changing many more moving parts and is not in scope for this specific proposal.
In the longer term, this could be a step toward zstd-compressed modules, but that would require changing many more moving parts and is not in scope for this specific proposal.
Out of curiosity, would the thinking there be to keep the module archives as ZIP, but swap the compression algorithm to zstd, or to switch to something else entirely like .tar.zst
?
The latter is more standard in terms of zstd compression, and will give a better compression ratio since all files are compressed together, but we would lose the ablity to seek through files without decompressing. I suspect that's not a problem, given that GOPROXY serves go.mod
files separately, and GOMODCACHE already extracts the entire module archives for use in cmd/go.
This is inspired by #62445, where @dsnet proposes using zopfli to create ~6% smaller .gz downloads for Go release downloads.
As he writes in that issue:
This proposal is to help usher in that future by offering zstd downloads in addition to gzip.
Here's a very quick'n'dirty comparison of compression performance on the same
go1.21.0.linux-amd64.tar.gz
archive Joe looked at:Also, decompressing the .zst archives takes about 4x less CPU time than decompressing the .gz archives on my machine.
If we offered .gz and .zst, people who care at all about size and speed can just use .zst and get a much bigger benefit than if we had zopfli-encoded .gzs.
[^1]: This is an estimate based on the fact that the file size falls between
gzip -5
andgzip -6
. I think that the actual release process usescompress/gzip
which is quite a bit slower.