bazelbuild / rules_pkg

Bazel rules for creating packages of many types (zip, tar, deb, rpm, ...)
Apache License 2.0
214 stars 167 forks source link

pkg_zip is very very slow with rules_python #795

Open peakschris opened 7 months ago

peakschris commented 7 months ago

There are significant performance issues with pkg_zip in a bazel environment (on Windows). We were finding when bazel was simultaneously packaging many zips, each one could take 45s instead of 2s expected. We discovered that this is because Bazel's hermetic python toolchain (rules_python) uncompresses many files to prepare for every single python invocation. More discussion here: https://bazelbuild.slack.com/archives/CA306CEV6/p1701253691249489

There appear to be two workarounds:

I've done a hacky port of build_zip tool to Go using Github Copilot, and it resolves our issue. I'm sharing it here in the hopes that it might be adopted by rules_pkg as an optional alternate language for those facing the same issue.

https://github.com/peakschris/build_zip_go

aiuto commented 7 months ago

There are two independent issues here:

  1. hermetic vs. local python.
  2. Go vs. python.

For 1, the answer is simple. Don't use hermetic python - locally installed is always better performance. IIUC, local is the default anyway according to the rules_python docs. So, you'll want to switch to that.

For 2, requiring multiple languages for the core implementation is sort of a complexity non-starter for me. I also don't want users of rule_pkg to have to pull in toolchains which might not be present. Everyone has python. So that answer there is to find a way where a user can, via a module extension, specify an alternate version of build_zip. People who don't mind bringing in the golang toolchain could depend on your version and inject in the dependency on your alternate. It might require toolchain-ising build_zip, with the python one as a fallback toolchain.

alexeagle commented 7 months ago

Note that https://github.com/libarchive/libarchive produces zip outputs. https://github.com/aspect-build/bazel-lib/blob/main/docs/tar.md includes a Bazel toolchain to fetch a pre-built binary, so you don't have the performance implications of an interpreted language to run zip nor a need to build the tool from sources.

peakschris commented 6 months ago

Alex, interesting, thank you. Does tar.md integrate with libarchive, or is that a future idea? I also can't see if there is a way to combine multiple manifests aka pkg_filegroup combining pkg_files. We pull together disparate parts of our build tree into zips and actually the pkg_files/pkg_filegroup api works quite nicely for this.