GoogleContainerTools / distroless

🥑 Language focused docker images, minus the operating system.
Apache License 2.0
18.6k stars 1.14k forks source link

Keep lists of packaged-installed files inside a built image #741

Open pombredanne opened 3 years ago

pombredanne commented 3 years ago

Since distroless are primarily built with Bazel I filed this issue https://github.com/bazelbuild/rules_docker/issues/1876 that am repasting here... but I reckon this may need to be tracked here instead:

🚀 feature request

Relevant Rules

When a package is installed, only metadata are kept and the list of installed files is lost/not saved with the package metadata.

I have a concern with what happens here: https://github.com/bazelbuild/rules_docker/blob/d18033b7eb3429a55dc4a579b5c19af57ab25e5f/container/build_tar.py#L224

Description

In a distroless container image, the as-installed .deb packages are not saved with their files/md5sums file lists in what would be in /var/lib/dpkg/info on a regular Debian install. As a result, it is not possible to relate an installed package in a distroless image/layer to the set of files that were installed with this package.

This data can be important for software composition analysis and its security and license compliance tracking applications.

Describe the solution you'd like

Each installed package should include some installed file listing possibly added in some per package file in the status.d/ directory. This is a Debian standard in /var/lib/dpkg/info/<package name>

This would make distroless images more readily introspectable and observable, otherwise there is no intrinsic way to relate a package (in status.d) to the set of its installed files.

@tejal29 you committed this originally with @dlorenc ... any insight to share there?

Describe alternatives you've considered

I cannot fathom an in-container alternative to keep a tab of each packaged-installed file. Tracking outside would mean maintaining some external database which does not seem practical.

pombredanne commented 3 years ago

Gentle ping :)

dlorenc commented 3 years ago

We have the list of installed packages in /bar/lib/dpkg/status, you're requesting a list of installed files, mapped back to the packages?

cc @loosebazooka

pombredanne commented 3 years ago

@dlorenc

you're requesting a list of installed files, mapped back to the packages?

yes

loosebazooka commented 3 years ago

Yeah I think this needs to be solved in rules_docker. Can you point to the debian docs for this, that would be helpful.

pombredanne commented 3 years ago

@loosebazooka See

Now since you already departed from the standard dpkg Debian layout with the status.d/ layout, feel free to use what you like. IMHO the simplest would be something such as /var/lib/dpkg/info/package_name.list list of files and directories installed by the package stored side-by-side with the status file. e.g. given /var/lib/dpkg/status.d/tzdata that contains package status for tzdata, /var/lib/dpkg/status.d/tzdata.list would be the list of installed paths for tzdata one line per path. It would be nice to also document this of course (including the actual use of status.d/ and the corresponding copyright files that are already there)

pombredanne commented 3 years ago

For reference I also entered https://github.com/bazelbuild/rules_docker/issues/1876 way back when (some would say this is a double post... but I was not sure where to post what ;) )

loosebazooka commented 3 years ago

Yeah I'm not exactly sure about this history of this change. So I'll have to do some reading, but thanks for the link.

pombredanne commented 3 years ago

@loosebazooka

I'm not exactly sure about this history of this change.

I am not sure what you mean by this... but if you mean about when the status.d files were introduced and what was there before, this looks simple from what I can see.

There was a single commit that introduced keeping some metadata in https://github.com/bazelbuild/rules_docker/commit/f5432b813e0a11491cf2bf83ff1a923706b36420 which essentially takes the control file and dumps it under status.d/

Before no metadata was kept https://github.com/bazelbuild/rules_docker/blob/3caf72f166f8b6b0e529442477a74871ad4d35e9/container/build_tar.py#L181

I can provide a patch in rules docker that would have either one of these effects in https://github.com/bazelbuild/rules_docker/blob/e5368f9c425854ddb5af31624f0a6b99a0d3f1fb/container/build_tar.py#L224

Do you want such a patch?

pombredanne commented 2 years ago

@loosebazooka gentle ping... do you want a patch here or at https://github.com/bazelbuild/rules_docker/issues/1876?

loosebazooka commented 2 years ago

Oh sorry, yeah I mean I don't know why this form of metadata was chosen. Anyway, it seems like the correct place to inject the metadata is in rules_docker. Please provide a patch there.

fedemengo commented 2 years ago

@pombredanne gentle ping, any news on the patch?

pombredanne commented 2 years ago

@pombredanne gentle ping, any news on the patch?

I have not attacked this yet. Do you want to chip in and help?

fedemengo commented 2 years ago

Let's continue the discussion over at bazelbuild/rules_docker#1876

pombredanne commented 2 years ago

@loosebazooka FYI I pushed a fix in https://github.com/bazelbuild/rules_docker/pull/2065 and your review is mucho welcomed there

fedemengo commented 1 year ago

Since the fix has been provided by @pombredanne and released in bazel docker rules v0.25.0 what's left to see the change reflected in new images?

Is it enough to bump rules_docker here?

loosebazooka commented 1 year ago

@thesayyn are these covered in the new rules_oci?

thesayyn commented 1 year ago

@thesayyn are these covered in the new rules_oci?

Yes. it is.

NOTE: some packages don't have an md5sums file, in that case, it is absent.

fedemengo commented 1 year ago

@thesayyn are these covered in the new rules_oci?

Yes. it is.

Does this mean we should already see it reflected in new images?

NOTE: some packages don't have an md5sums file, in that case, it is absent.

at least for the packages that have md5sums files

loosebazooka commented 1 year ago

@fedemengo not yet. We're in the middle of a larger transition to rules_oci and when that is complete, you will being to see this metadata.

fedemengo commented 1 year ago

awesome, thanks for the update

pombredanne commented 1 year ago

@loosebazooka you wrote:

We're in the middle of a larger transition to rules_oci and when that is complete, you will being to see this metadata.

Hey! is the transition done?

fedemengo commented 7 months ago

looks like after https://github.com/GoogleContainerTools/distroless/pull/1367 the new images contain the expected metadata