GoogleContainerTools / distroless

🥑 Language focused docker images, minus the operating system.
Apache License 2.0
18.7k stars 1.14k forks source link

Building distroless in air-gapped environments #577

Open smijolovic opened 4 years ago

smijolovic commented 4 years ago

One of the challenges experienced over the last two weeks has been the availability of remote servers to build all debian10 images from source. For example, the incompleteRead error persisted for over three days before starting to work today. This seems to be an inherent and widely known issue against the use of bazel, as well as its dependencies (java is a big one and go_sdk running 1.14.4 with three CVEs just to name a few). Not being able to build anything from source for three days is a considerable capability gap in agile development.

Bazel has documented a manual way to pull and archive distribution directories: https://docs.bazel.build/versions/master/guide.html#running-bazel-in-an-airgapped-environment

Is it possible to document the pulling of all dependencies so that they can be archived (including dpkg_parser.par)?

It would be ideal to use a debian image builder like moby/debootstrap to build these distroless containers, as it has only two dependencies, vs 30 to bring in openjdk-devel...which isn't even necessary to build them. Pulling the required .deb files from debian is straightforward using debootstrap but the process of turning that into a rootfs with just the distroless packages is more cryptic (Failure trying to run: chroot "/tmp/distroless/rootfs" /bin/true), so it's safe to assume some other method is being used to create the rootfs image.

smijolovic commented 4 years ago

As of this morning, same problem persists:

ERROR: An error occurred during the fetch of repository 'package_bundle': dpkg_parser command failed: Traceback (most recent call last): File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/rpmbuild/nimbus8/distroless-build-source/distroless-1.0.0/distroless/bazel-bin/package_manager/dpkg_parser.par/main.py", line 217, in File "/home/rpmbuild/nimbus8/distroless-build-source/distroless-1.0.0/distroless/bazel-bin/package_manager/dpkg_parser.par/main.py", line 83, in main File "/home/rpmbuild/nimbus8/distroless-build-source/distroless-1.0.0/distroless/bazel-bin/package_manager/dpkg_parser.par/main.py", line 109, in download_dpkg File "/home/rpmbuild/nimbus8/distroless-build-source/distroless-1.0.0/distroless/bazel-bin/package_manager/dpkg_parser.par/main.py", line 127, in download_and_save File "/usr/lib64/python3.6/http/client.py", line 472, in read s = self._safe_read(self.length) File "/usr/lib64/python3.6/http/client.py", line 624, in _safe_read raise IncompleteRead(b''.join(s), amt) http.client.IncompleteRead: IncompleteRead(19079168 bytes read, 172819698 more expected) (/home/rpmbuild/nimbus8/distroless-build-source/distroless-1.0.0/distroless/bazel-bin/package_manager/dpkg_parser.par --package-files /home/rpmbuild/.cache/bazel/_bazel_rpmbuild/b7d35c6c7c651f4f3e6802aeff641f4b/external/debian_stretch_security/file/Packages.json,/home/rpmbuild/.cache/bazel/_bazel_rpmbuild/b7d35c6c7c651f4f3e6802aeff641f4b/external/debian_stretch_updates/file/Packages.json,/home/rpmbuild/.cache/bazel/_bazel_rpmbuild/b7d35c6c7c651f4f3e6802aeff641f4b/external/debian_stretch_backports/file/Packages.json,/home/rpmbuild/.cache/bazel/_bazel_rpmbuild/b7d35c6c7c651f4f3e6802aeff641f4b/external/debian_stretch/file/Packages.json --packages libc6,base-files,ca-certificates,openssl,libssl1.0.2,libssl1.1,libbz2-1.0,libdb5.3,libffi6,libncursesw5,liblzma5,libexpat1,libreadline7,libtinfo5,libsqlite3-0,mime-support,netbase,readline-common,tzdata,libgcc1,libgomp1,libstdc++6,zlib1g,libjpeg62-turbo,libpng16-16,liblcms2-2,libfreetype6,fonts-dejavu-core,fontconfig-config,libfontconfig1,libuuid1,openjdk-8-jre-headless,openjdk-8-jdk-headless,openjdk-11-jre-headless,openjdk-11-jdk-headless,libc-bin,libpython2.7-minimal,python2.7-minimal,libpython2.7-stdlib,dash,libc-bin,libmpdec2,libpython3.5-minimal,libpython3.5-stdlib,python3.5-minimal,libcurl3,libgssapi-krb5-2,libicu57,liblttng-ust0,libssl1.0.2,libunwind8,libuuid1,zlib1g,curl,libcomerr2,libidn2-0,libk5crypto3,libkrb5-3,libldap-2.4-2,libldap-common,libsasl2-2,libnghttp2-14,libpsl5,librtmp1,libssh2-1,libkeyutils1,libkrb5support0,libunistring0,libgnutls30,libgmp10,libhogweed4,libidn11,libnettle6,libp11-kit0,libffi6,libtasn1-6,libsasl2-modules-db,libgcrypt20,libgpg-error0,libacl1,libattr1,libselinux1,libpcre3,libbz2-1.0,liblzma5 --workspace-name package_bundle --versionsfile /home/rpmbuild/nimbus8/distroless-build-source/distroless-1.0.0/distroless/package_bundle.versions)

Why is this so unstable?

chanseokoh commented 4 years ago

I feel the Debian package repository has become a bit unstable for the last one or two months, but I don't think it's unstable to the extend you are experiencing. A lot of people are building Distroless from source, several external users filed PRs in the past few days, and Travis builds have run fine. I think your environment particularly has a stability issue connecting to the Debian repo. Maybe get some help from the Debian folks?

smijolovic commented 4 years ago

I don't believe so. FYI - it just started to work a few minutes ago. There are no stability issues anywhere else. I pull from over 70 source repositories with zero issues on any other project.

Is there a list of the distribution directories so that they can be archived and only pulled when there are changes to the source tree?

chanseokoh commented 4 years ago

I think you misunderstand something, as you keep saying "distribution directories". There's no issue with Bazel dependencies, and I think all such Distroless Bazel dependencies should have already been cached on your machine. The error is also unrelated to changes to the source tree or Bazel dependencies.

Rather, it's when the custom python binary (called dpkg_parser.py) in Distroless downloads Debian packages from the Debian shapshot repository. It's all custom, and it's totally unrelated to Bazel. This python application downloads a lot of .deb files, such as as big as https://snapshot.debian.org/archive/debian/20200903T084506Z/pool/main/o/openjdk-11/openjdk-11-jre-headless_11.0.6%2B10-1~bpo9%2B1_amd64.deb.

I think what you can try is to try different Debian mirrors. You can update WORKSPACE to set a different URL.

https://github.com/GoogleContainerTools/distroless/blob/a4bdf6db4623452c9e3020b58bc5446f9e4b930f/WORKSPACE#L47

(There are multiple instances of https://snapshot.debian.org/archive in the file, so search for the URL and update all of them.)

Maybe the best is to set up your own Debian package mirror (at least for the packages being used by Distroless).

smijolovic commented 4 years ago

The link I pointed to shows the ability to archive all the bazel dependencies: https://docs.bazel.build/versions/master/guide.html#distribution-files-directories

That way all dependencies can be archived. Looking to do the same with the debian packages.

What's interesting is that moby/debootstrap pulls in the same Debian mirrors to do the same thing, and never has this situation. You can also tar up the .deb files to address air-gapped/network issues.

chanseokoh commented 4 years ago

The link I pointed to shows the ability to archive all the bazel dependencies: https://docs.bazel.build/versions/master/guide.html#distribution-files-directories

That way all dependencies can be archived. Looking to do the same with the debian packages.

Maybe you're right. I'm not so familiar with Bazel. The repo welcomes contributions to make this happen. (As I said, this repo is primarily maintained and driven by the users like you.) Would be a super cool feature.

chanseokoh commented 4 years ago

Another idea would be just to retry if there's a failure. Somewhat similar to the spirit of a recent external contribution that made dpkg_parser.py resume downloading when Debian abruptly shutdowns connection.

chanseokoh commented 3 years ago

Related: #602 proposes an idea to drop the custom python download program and use an external tool that hopefully increases robustness in downloading .deb files.

smijolovic commented 3 years ago

The proposal looks to increase the stability and reliability of the deb file extraction, but the specific request here is a process or procedure to build distroless images in an air-gapped environment without external connectivity.

For example - to build the static-distroless image for kubernetes 1.19+ bazel build //package_manager:dpkg_parser.par --distdir=../../bazel/distdir bazel build //base:static_root_amd64_debian10 --distdir=../../bazel/distdir bazel run //base:static_root_amd64_debian10

Produces the distroless base - but requires external connectivity for the distroless dependencies. The need is a "bazel fetch" process to obtain all of the image deb files and additional requirements to build in an air-gapped development environment that doesn't trust precompiled or externally provided compilers and binaries.

jonjohnsonjr commented 3 years ago

This doc seems relevant for air-gapped environments.

chanseokoh commented 3 years ago

I feel the Debian package repository has become a bit unstable for the last one or two months

FTR, https://github.com/GoogleContainerTools/distroless/issues/602#issuecomment-892669480