GoogleContainerTools / distroless

🥑 Language focused docker images, minus the operating system.
Apache License 2.0
18.98k stars 1.16k forks source link

Including other debian packages without polluting the base image? #863

Open officialpatterson opened 3 years ago

officialpatterson commented 3 years ago

Hi all,

Looking into using these images in my own development workflow but I'm finding it hard to see how to add in specific shared libraries where it's necessary for a single app image without polluting the base image for example.

As a more concrete example, one of the apps has a dependency on libparquet-dev and so I want to include it in that apps image. However I don't want to go down the route of including it in the base image as I feel the base image should be as general as possible (that's the correct way of thinking right?)

Any help appreciated!

AP

evanj commented 3 years ago

You can hack this using a multi-stage Docker build. It requires manually figuring out what Debian packages to include though. Here is an example I have that installs graphviz and its dependencies:

https://github.com/evanj/pprofweb/blob/aa0a1a2e87be02f1527f334620c4d45f6fb6ffd4/Dockerfile#L3-L7

... It is possible I should add something like this to the examples in distroless, because I've seen this asked a few times.

616b2f commented 2 years ago

@evanj Does this also includes the files in var/lib/dpkg/status or var/lib/dpkg/status.d/ for vulnerability scanner to find which packages are installed in the resulting image?

evanj commented 2 years ago

The approach in the Dockerfile linked above extracts the contents of the listed .deb packages into the container image using dpkg --extract. A quick test on a package locally suggests that it does not include a status file. A bit of shell scripting could probably fix that? E.g. the following extracts the control file, which I believe are the files that are appended to create /var/lib/dpkg/status:

dpkg --ctrl-tarfile (path to deb)  | tar xvf - ./control
616b2f commented 2 years ago

@evanj thank you, didn't know that possibility even exists, I will definitely try that one out and report back.

616b2f commented 2 years ago

@evanj this worked like charm, thank you again!

for others that maybe looking in this issue, this is how I did it:

RUN cd /tmp && \
          apt-get update && \
          apt-get download \
                  # .NET Core dependencies
                  libc6 \
                  libgcc1 \
                  libgssapi-krb5-2 \
                  libicu63 \
                  libssl1.1 \
                  libstdc++6 \
                  zlib1g \
                  && \
          mkdir -p /dpkg/var/lib/dpkg/status.d/ && \
          for deb in *.deb; do \
                  package_name=$(dpkg-deb -I ${deb} | awk '/^ Package: .*$/ {print $2}'); \ 
                  echo "Process: ${package_name}"; \
                  dpkg --ctrl-tarfile $deb | tar -Oxvf - ./control > /dpkg/var/lib/dpkg/status.d/${package_name}; \
                  dpkg --extract $deb /dpkg || exit 10; \
          done 
Sineaggi commented 2 years ago

Would it make sense to include @616b2f's example as an example in the repo itself? At least for now until some distroless itself provides some extensible way of shipping additional libraries.

Currently the only example that includes a build stage is this Dockerfile, which while it does require a compiler to build some code, doesn't need to retain any of the installed libraries in the final distroless image.

This would also be helpful for other languages that require some system dependency (even java has a few libraries that should be available for jni).

616b2f commented 2 years ago

@Sineaggi would make sense in my opinion. You can also copy a bigger example that may also solve another issue (that we don't have any dotnet distroless container anymore here). That's what I use to build a dotnet distroless container at the moment:

https://github.com/616b2f/distroless-dotnet/blob/main/Dockerfile

CodeMonkeyLeet commented 1 year ago

Slight note for anyone building off the base images (e.g. base-debian11) instead of cc or core: the libssl1.1 package control file is provided as status.d/libssl1 rather than using the correct package name of libssl1.1, so there's a subtle gotcha that vulnerability scanners will see the older file and any vulnerabilities there unless removed in favor of a manually deployed version.

LucaMaurelliC commented 1 year ago

From what I understood, the problem faced here is to install a .deb package into the distroless image. I have few questions @evanj @616b2f

  1. Does dpkg --extract perform an "installation"? Seems like it just extracts the content. No dependencies are checked and no scripts are triggered pre/post removal and extraction of packages, is that right?
  2. What about using dpkg --install -recursive /tmp to install all the packages in the target directory (where .deb were downloaded)?
  3. Is there a way to target the installation into a desired directory? I'm thinking something like Python enviroments, so that you could just copy that enviroment into another Docker image with COPY --from. The docs of dpkg contain the --root flag which might be exploited (out of my competence though). I think the idea is to "let the system think that the target directory is the root directory, e.g. '/', and run the dpkg utility inside it". Later we could just do something like COPY --from=deb_extractor /dpkg / to translate the installation into the distroless image.
tuananh commented 1 year ago

You can hack this using a multi-stage Docker build. It requires manually figuring out what Debian packages to include though. Here is an example I have that installs graphviz and its dependencies:

https://github.com/evanj/pprofweb/blob/aa0a1a2e87be02f1527f334620c4d45f6fb6ffd4/Dockerfile#L3-L7

... It is possible I should add something like this to the examples in distroless, because I've seen this asked a few times.

how do you figure out the dependencies tree?

right now, i start with a clean one. apt list --installed before and after and get a diff. but this process is cumbersome.

Sineaggi commented 1 year ago

@tuananh I wrote up a small example here but the tl;dr is you need to first generate the list of dependencies:

$ apt-cache depends libpq5
libpq5
  Depends: libc6
  Depends: libgssapi-krb5-2
  Depends: libldap-2.4-2
  Depends: libssl1.1

Then pass those into apt-cache download.

It's a bit cumbersome because you'd have to subtract any dependency from apt-cache depends from what the base image would already contain.

In the future I imagine distroless itself could be made as a dependency to other bazel builds, or some parts as a cli that could be used to build images from dependency lists.

tuananh commented 1 year ago

@Sineaggi i think we need to use recurse flag as well

https://gist.github.com/tuananh/1e8e0f921410a830a7cd1161ff8bb189

usage: ./aptdeps.sh bash krb5-user etc...

#!/bin/bash
set -eu

declare -a all_deps=( )
for pkg_name in "$@"
do
    declare -a deps=$(apt-cache depends -i --recurse $pkg_name | awk -F 'Depends: ' 'NF>1{ sub(/ .*/,"",$NF); print $NF }' | sort | uniq)
    all_deps+=$deps
done

printf '%s\n' "${all_deps[@]}" | sort | uniq
piranna commented 3 months ago

@tuananh, apt-cache depends accept several package names, so there's no need to do a loop

piranna commented 3 months ago

Any idea how to resolve the virtual packages instead of showing them?