anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.2k stars 571 forks source link

Expose file metadata for image contents #477

Open wagoodman opened 3 years ago

wagoodman commented 3 years ago

Today the package catalogers expose some file information from the cataloging source, not directly about the file on disk (e.g. indirect file metadata from the RPM DB, not metadata gotten directly from the file location in the image archive). It would be interesting to expose out direct (not indirect) file metadata information as artifacts in at least the context of SPDX SBOM format.

This involves looking at the existing file cataloger and understanding if it should be invoked conditionally based on the user output format option, or directly by the presenter object (not ideal), or something else.

zhill commented 2 years ago

It would be helpful to have the "expected" and "observed" metadata (uid, guid, mode, checksums) for the files so that a user can determine if the pkgdb entry matches the actual content. I'm not sure how much of that is necessary for SPDX in particular, but it would have value beyond that IMO.

wagoodman commented 9 months ago

This has effectively been implemented and turned on by default in https://github.com/anchore/syft/pull/1383. Specifically, in the files section of the SBOM we now catalog file metadata and digests for all files that are claimed to be owned by a package by default. The user additionally has the option to change the files reported out by changing the file.metadata.selection to all or none. There isn't any specific claims about if metadata from a package matches that of what was actually observed, however, the document is raising up enough information to be able to discern this now.

wagoodman commented 9 months ago

I got a little ahead of myself on claiming a victory here. Though the above comment is true, what is missing is tying this back to what SPDX can express in terms of FilesAnalyzed: https://github.com/anchore/syft/blob/ac34808b9c55bb274b1205f9b5d9cf495239577d/syft/format/common/spdxhelpers/to_format_model.go#L476-L508

To really run this to ground we would need to find the elements from the files section of the syft core SBOM model that correspond with the packages claimed to be owned by the package and report out any checksums we may have.