Open PeterMylemans opened 4 years ago
I think this is a fantastic idea. In this case, if there's a reliable and well-maintained external facility, I think there's no need to maintain a custom solution. Completely removing the custom python has been what I've been hoping for a very long time.
from what I can tell it is not packaged in rules_pkg, but instead lives as a separate workspace in the same git repo.
I'm not really familiar with the Bazel ecosystem. What does it mean? It is like "experimental", "alpha", or "use at your own risk"? I think it can be troublesome if it's not well maintained.
cc @dlorenc
I have the same question about maintainability. Currently it is not part of the published package and should be consumed as a git source repository dependency with a prefix to include only the deb_packages subdirectory.
Honestly, I would have expected to have a rules_deb_repository repo with its own release cycle, tagging and packaging. But maybe I'm just biased towards a polyrepo approach.
In any case I'm not a Bazel community expert either.
@petermylemans can we ask them to publish the rule and support it?
The current content of rules_pkg is confusing I would like to fix that. Most repos under bazelbuild https://github.com/bazelbuild name rules_X contain rules for creating X. The few that are for consuming rather than producing have entirely different names or some modifier (e.g. rules_jvm_external).
So, I would like to see rules_pkg just contain rules to create packages, not consume them. Along that line, anything for building docker images should be in rules_docker because people do work on that. Also, I don't have the knowledge to review anything involving docker, either producing or consuming. Someone else must own it.
So, where do we put improved consumption rules?
Basically, I am fine with any solution, but I can't do any of the work, for the reasons above.
Sounds to me that the deb repository rule should live in rules_docker.
They already have rules to deal with installing packages into containers using package manager runtimes such as Apt.
However, these seem to "execute" the package manager in a running container using a docker runtime. So the result is not always idempotent, depending on the installation process. Functionally the resulting image will be the same, but digests might differ. In my experience, it is a lot easier to manage transitive dependencies this way, but it also pulls in more dependencies than is needed for constructing an application runtime.
We could go in two phases:
Sounds to me that the deb repository rule should live in rules_docker
cc @dmarting (once upon a time, they were built in/for rules_docker 😉 )
Isn't it that deb_packages
has nothing to do with Docker? It just downloads deb files from a package mirror and makes them available to use, which other users outside the container context can leverage, right? I may be missing something, but I am not sure why it should live in rules_docker
.
We could go in two phases:
- Put the rule in distroless
- Migrate the rule to rules_docker once the improvements are hammered out.
You mean basically like forking the code into distroless (and then wait for it to be upstreamed, and migrate once it's officially supported)? If so, I am not sure it's a sustainable solution to us (distroless).
Yeah, in practice it takes a very long time to unfork these kinds of things and rely on the new upstream. Bazel not solving the diamond dependency problem exacerbates this.
I realize now I did not directly address @chanseokoh\'s question "I'm not really familiar with the Bazel ecosystem. What does it mean? It is like "experimental", "alpha", or "use at your own risk"? I think it can be troublesome if it's not well maintained."
github.com/bazelbuild/rules_pkg/deb_packages exists but no one is maintaining it in any way. The only work being done in rules_pkg is on the low level packaging side (making tarballs, RPMs and debs). IMO, deb_packages is neither experimental, alpha, or own risk. It is classic abandonware.
Also agree...I've been trying my best to propose a move away from Bazel not just because of this, but the mere impossibility of air gapped/dark site dependency inclusion and injection of vulnerabilities from its poor tracking of dependencies.
The more you build in go on Bazel for example...the more vulnerabilities in the actual container entrypoint it exponentially propagates. It only exists in my environment for this and envoy...and is the sole reason for even having openJDK and its slew of dependencies....so much for a microservices minimalist mentality.
A deb packaging...rootfs creation..output process would allow anyone to use any package manager or process of their choice, so if I want Dockerfiles to create the base image I can as an example and use buildah to package it as an OCI tar.
@chanseokoh You are right, deb_repository handling is not specific to docker / containers. I was only suggesting it to live there, because it is the main use case today. But I guess it can be extended to any kind of software that requires consuming deb packages.
Maybe it is is better to spin up a new repo: e.g. rules_deb_repository under the same copyright (Apache 2.0). I'm ok to do this, maintain it and publish it e.g. under github.com/petermylemans/rules_deb_packages and add you as collaborator to avoid the "bus factor".
Maybe it would be better under an organization: in any case the repo can always be transferred should it become a requirement.
Pinging @jayconrod here, in case he has insights.
I've done most of the small changes (current version rules_pkg/deb_packages has issues with current version of bazel) and updates required here: https://github.com/petermylemans/rules_deb_packages. That coudl replace distroless' "python module" with an improved module that makes use of bazel builtin support for downloading (and caching) remote archives.
You can have a look at the example and/or README.md
But what still bothers me though is related to the fact that bazel promotes including all dependencies as source. People seem to solve this by providing a dependencies macro (the deps.bzl / repositories.bzl / ...), for consuming projects.
@mattmoor I've hijacked some of the approach as used in rules_docker. But I still got the issue that e.g. github.com/bazelbuild/buildtools has done a recent change that makes it incompatible with some older versions of gazelle (and rules_docker by extension). This results in a delicate balance of versions...
For tooling this seems a bit strange to me. I would expect to be able to just include prebuilt binaries for the tools used in consuming applications (like distroless). So they don't need to bother with managing dependencies for a simple tool used for keeping versions up to date. But then I get a chicken and egg problem at the rules_deb_packages side, as its dependencies macro would need to provide http_archive repo rules for its own supporting binaries?
Anybody got any experience to deal with this of am I proposing crazy ideas here?
I wonder how feasible it might be to use something like goreleaser
to build binaries and attach them as release artifacts, and then have a downstream step construct a .bzl
file that folks could pull down for that release in workspace 🤔
Generally it is possible (and in the case of WORKSPACE tooling required) to pull down prebuilt binary tooling, but you want to make sure you build it for all the downstream platforms that it runs on (I recently hit this with linux/arm64).
Seems like it'd be a fun pattern now that github actions exist, they didn't the last time I was deep in Bazel land.
I used a regular shell script for now, but it implements the same idea. Another point of the list. :smile:
Next up: I'll go for a draft PR to see if the approach is what we want or not.
@mattmoor @chanseokoh can you have a quick look at the draft PR to see if it's on track? The WORKSPACE file seems to grow somewhat, but I guess that is normal due to the amount of debian packages and architectures being processed.
The "magic" of selecting the right package has been moved to the update deb packages process, so basically the urls and sha256 are stored in the WORKSPACE file.
Deb downloads are a LOT faster and is more stable. I suspect that switching to the main debian CDN has something to do with that (as it is no longer proxying through snapshots).
Deb downloads are a LOT faster and is more stable. I suspect that switching to the main debian CDN has something to do with that (as it is no longer proxying through snapshots).
One of the nice, presumably intended, consequences of distroless using snapshots.debian.org, is that one can recreate a given distroless image even when the main Debian release archives have moved on.
Will the changes proposed in #614 remove that?
In the month of October - distroless builds were down for about 10 days. Meanwhile, debootstrap worked flawlessly every single day. This is where the arguments about debian builds just don't hold water. This bazel process is frankly the most unstable and non-reproducible build process I have seen in years...not to mention a nightmare for air-gapped.
I still don't understand why this is necessary. This should move to a much simpler minbase debootstrap, package removal, and injections of cacerts/group/passwd/nsswitch/os-release files. The process for building debian-base and debian-iptables is stable. That actually works. If it ain't broke....
It's VERY troubling how baseimages has moved to an unstable process with a bloated package manager that increases the security threat profile tremendously and requires internet connectivity to build. This doesn't bode well for the future longevity of kubernetes if this is a dependency.
Deb downloads are a LOT faster and is more stable. I suspect that switching to the main debian CDN has something to do with that (as it is no longer proxying through snapshots).
One of the nice, presumably intended, consequences of distroless using snapshots.debian.org, is that one can recreate a given distroless image even when the main Debian release archives have moved on.
Will the changes proposed in #614 remove that?
Deb packages remain available, if only from the pool in http://archive.debian.org/ instead of deb.debian.org. That is why both are included in mirrors for the pool urls. The snapshot basically handles the caching of coherent "release package files", but that is mostly useful when using a package manager such as apt or the apt-simulator in debootstrap. The apt-simulator in distroless followed the same practice at build time. Now this resolution of packages is done at "update tool" time, so the build can work with a fixed set of explicitly versioned dependencies (instead of a dynamic, but stable one).
In case of distroless using the bazel build tool, it might as well download (and cache) the pinned versions directly from the debian mirror pool and validate the sha256 sums for correctness. This is pretty similar to how a http_file repository works and that matches better with how bazel is designed to work as well. This would also fit in nicely with how Bazel deals with airgapped builds today.
While I can understand @smijolovic frustration with the current system, leaving bazel behind is akin to rewriting most of what is in the distroless project. That is a lot to ask from a project that is mostly community driven. My honest opinion (as someone who is not responsible for this repo) on this: since distroless' inception new tools have come to light that look promising, but are at the same time still finding their place in the eco system. I'm usually the first to rewrite things and move forward (and I see a lot of merit in tools like buildah or cloud native buildpacks), but in this case I would proceed with some caution: lot's of promises of "the new silver bullet in container building", but time will tell.
That is why I would fix the python based packager now (that broke bazel's builtin repository handling to some degree) and keep an eye out for the future of alternative build tools.
Thanks for the response Peter. If I'm understanding your explanation correctly, I don't think it's quite working as you expect.
If I clone your repo and run $ bazel build //base:static_nonroot_amd64_debian10
I get the following (trimmed) error:
ERROR: no such package '@packages_amd64_debian9//debs': java.io.IOException: Error downloading [http://deb.debian.org/debian/pool/updates/main/o/openjdk-8/openjdk-8-jdk-headless_8u265-b01-0+deb9u1_amd64.deb, http://deb.debian.org/debian-security/pool/updates/main/o/openjdk-8/openjdk-8-jdk-headless_8u265-b01-0+deb9u1_amd64.deb, http://archive.debian.org/debian/pool/updates/main/o/openjdk-8/openjdk-8-jdk-headless_8u265-b01-0+deb9u1_amd64.deb, http://archive.debian.org/debian-security/pool/updates/main/o/openjdk-8/openjdk-8-jdk-headless_8u265-b01-0+deb9u1_amd64.deb] to /home/joshuagl/.cache/bazel/_bazel_joshuagl/6b17bc35729cb244efa1a69ee18e7f1c/external/packages_amd64_debian9/debs/1a576428b61c9671cab4072f6d7b1b70027307be882ea0e4ed23ed0c6683e3d2.deb: GET returned 404 Not Found
INFO: Elapsed time: 2.397s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
currently loading: base
I believe this means openjdk-8-jdk-headless_8u265-b01-0+deb9u1_amd64.deb is not available from archive.debian.org or deb.debian.org?
If I first run bazel run update_deb_packages
the build is able to proceed.
In the diff of packages_amd64_debian9.bzl I see pool/updates/main/o/openjdk-8/openjdk-8-jre-headless_8u265-b01-0+deb9u1_amd64.deb has been replaced with pool/updates/main/o/openjdk-8/openjdk-8-jdk-headless_8u272-b10-0+deb9u1_amd64.deb.
Mmm I'll have to look into this later this week.
Thanks for the heads-up: I'll come back on this.
Is there a way with:
bazel build //package_manager:dpkg_parser.par
To specify only a certain image to load? This fails most often loading all of the packaging for all arch and debian versions.
FWIW my understanding of the Debian package archives is as follows.
Thus, per my understanding at least, without using snapshot.debian.org it will only ever be possible to fetch the most recent version of a package for a release.
I tried to find some document(s) that described the above but didn't have much luck. There is some description of the Debian repository format here, but it does not describe all of the above: https://wiki.debian.org/DebianRepository/Format
@joshuagl I can confirm that you are correct. :+1: Good catch and thanks for the investigation! The ftp behaviour was a bit surprising to me to be honest.
I'll rework the PR towards using the snapshot repo's as a mirror instead of regular deb.
Work and life caught up with me in the last months, but I did have a look at the snapshot option.
After going back and forth a bit, using snapshots always ends up quite close to what we have already.
Maybe best to revisit this after the mult-arch changes, as they are bound to conflict anyway.
😯 They actually fixed a bug on snapshot.debian.org: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=960304
Hopefully this gives more stability.
Also relates to #153 and the clean up already done by @mattmoor (thanks btw!). During further investigation I've stumbled on some work in rules_pkg github repository [1] that looks like an improved version of the deb management that is currently in the distroless workspace.
Advantages:
Things to improve:
@mattmoor @chanseokoh @aiuto feel free to review this idea. I'm happy to start a PR in either repository if we agree this is a good way forward.
References: [1] https://github.com/bazelbuild/rules_pkg/blob/main/deb_packages/WORKSPACE [2] https://github.com/bazelbuild/rules_pkg/tree/main/deb_packages/tools/update_deb_packages