JuliaLang / Pkg.jl

Pkg - Package manager for the Julia programming language
https://pkgdocs.julialang.org
Other
610 stars 251 forks source link

replace 7z with gzip #3824

Open StefanKarpinski opened 4 months ago

StefanKarpinski commented 4 months ago

It seems from reports on discourse that 7z is banned for security reasons at many organizations. We used to use a lot of its functionality, but now we only use it in ways that gzip can replace. We should do that. The changes to Pkg are actually trivial, the most annoying part is removing 7z as a Base dependency and adding gzip.

stevengj commented 4 months ago

Or something more modern like zstd?

giordano commented 4 months ago

But we want to be able to unpack more than gzipped tarballs, no? https://github.com/JuliaLang/Pkg.jl/blob/48eea8dbd7b651cdc932b909c1b718bb9c3f94f4/src/PlatformEngines.jl#L485

StefanKarpinski commented 4 months ago

Or something more modern like zstd?

Adding support for zstd would be fine, but we can't drop support for gzip in the client because there are private package servers in the wild that will continue serving gzip tarballs for the foreseeable future and we also can't drop gzip as a format on servers because they have to keep serving to older Julia versions that expect gzip. So adding zstd would force servers to generate and serve both gzip and zstd. I'm also not convinced that zstd is such a huge win for our use case. I haven't seen massively better compression ratios. The compression time is better, but since we'll have to do gzip and zstd on the server, that doesn't actually save us any compute. Decompression time is also better, but I don't think decompression time is a meaningful bottleneck. My inclination is to add support for negotiating the compression format so that at some point in the future we can migrate over.

But we want to be able to unpack more than gzipped tarballs, no?

I'm not sure how big an issue this is in practice (i.e. how much do we actually use/rely on this), but what we can do is ship gzip with Julia itself and then dynamically install JLLs for the other compression formats as required. That allows us to bootstrap from gzip to install and handle other compression formats as needed.

mnemnion commented 4 months ago

Bundling libarchive would cover gzip, and leave plenty of flexibility for switching to another, or decoding existing data in other formats.

StefanKarpinski commented 4 months ago

I'm a bit hesitant to swap out one swiss-army-knife dependency that handles multiple compression formats but isn't the standard tool that's guaranteed to be patched and maintained and allowed by sysadmins for another one with the exact same description. Of course, libarchive looks much better maintained than 7z, but you get my point. It also seems likely that gzip is going to be faster and more efficient than anything else.

stevengj commented 4 months ago

Adding support for zstd would be fine, but we can't drop support for gzip in the client

The zstd web page says that its library/command-line tool supports .gz files? Or does it do that by linking another library?

Looks like zstd include a minigzip.c implementation, but maybe it requires zlib?

But it seems safer/simpler to stick with gzip alone if there's not a compelling reason to support other formats.

mnemnion commented 4 months ago

Of course, libarchive looks much better maintained than 7z, but you get my point.

Worth noting that libarchive has been integrated into Windows 11, which has implications both for continuity of maintenance and the likelihood that sysadmins will green-light it. I can't speak to gzip's speed and efficiency vs. libarchive, but it's clearly both a more narrowly-tailored tool and some of the most battle-tested code in existence.

StefanKarpinski commented 4 months ago

Ok, seems like we've got a few solid options here—gzip, zstd (with built-in gzip), or libarchive.

nsajko commented 3 months ago

A pure Julia solution would allow decreasing the amount of external dependencies of the Julia install. The memory-safety would also look better security-wise, given no @inbounds or other unchecked operations.

StefanKarpinski commented 3 months ago

If it's implemented in Julia we have to build it into the Julia sysimg, which we would rather not do. It's also safer to do decompression in an external transient process. Ideally, we would put the decompression process in a jail where it can't open files and all it can do is read from stdin and write to stdout. Putting it in the main Julia process is the opposite of what we want from a security perspective.

nhz2 commented 1 month ago

There is a new version of 7z that supports unpacking zstd https://github.com/ip7z/7zip/releases/tag/24.05

giordano commented 1 month ago

Note that we use https://github.com/p7zip-project/p7zip, that's a different project.

nhz2 commented 1 month ago

From what I can tell we are already using 7zip 23.01 for Windows instead of p7zip. According to https://github.com/ip7z/7zip/blob/24.05/DOC/readme.txt compiling 7zip for Unix is possible now.