GoogleContainerTools / rules_distroless

Apache License 2.0
55 stars 34 forks source link

feat: support flat repos #97

Open jjmaestro opened 2 months ago

jjmaestro commented 2 months ago

[!NOTE]
Stacked on top of #96

Fixes issue #56 and #66

Follow-up and credit to @alexconrey (PR #55), @ericlchen1 (PR #64) and @benmccown (PR #67) for their work on similar PRs that I've reviewed and drawn some inspiration to create "one 💍 PR to merge them all" 😅

Problem

Debian has two types of repos: "canonical" and "flat". Each has a different sources.list syntax:

"canonical": (see https://wiki.debian.org/DebianRepository/Format#Overview)

deb uri distribution [component1] [component2] [...]

flat: (see https://wiki.debian.org/DebianRepository/Format#Flat_Repository_Format)

deb uri directory/

Per the spec,

A flat repository does not use the dists hierarchy of directories, and instead places meta index and indices directly into the archive root (or some part below it)

Thus, the URL logic in _fetch_package_index() is incorrect for these repos and it always fails to fetch the Package index.

Solution

Just use the Debian sources.list convention in the sources section of the manifest to add canonical and flat repos. Depending on whether the channel has one directory that ends in '/' or a (dist, component, ...) structure the _fetch_package_index () and other internal logic will know whether the source is a canonical or a flat repo.

For example:

version: 1

sources:
  # canonical repo
  - channel: bullseye main contrib
    url: https://snapshot-cloudflare.debian.org/archive/debian/20240210T223313Z
  # flat repos, note the trailing '/' and the lack of distribution or components
  - channel: bullseye-cran40/
    url: https://cloud.r-project.org/bin/linux/debian
  - channel: ubuntu2404/x86_64/
    url: https://developer.download.nvidia.com/compute/cuda/repos

archs:
  - amd64

packages:
  - bash
  - r-mathlib
  - nvidia-container-toolkit-base

Disregarding the "mixing" of Ubuntu and Debian repos for the purpose of the example, this manifest shows that you can mix canonical and flat repos and you can mix multiarch and single-arch flat repos and canonical repos.

You will still have the same problems as before with packages that only exist for one architecture and/or repos that only support one architecture. In those cases, simply separate the repos and packages into their own manifests.


[!NOTE] This PR also fixes an issue with NVIDIA CUDA flat repos that don't follow the Debian repo spec and have invalid 'Filename' paths.

The Debian repo spec for 'Filename' says:

The mandatory Filename field shall list the path of the package archive relative to the base directory of the repository. The path should be in canonical form, that is, without any components denoting the current or parent directory ("." or ".."). It also should not make use of any protocol-specific components, such as URL-encoded parameters.

However, there are cases where this is not honored. In those cases we try to work around this by assuming 'Filename' is relative to the sources.list directory/ so we combine them and normalize the new 'Filename' path.

Note that, so far, only the NVIDIA CUDA repos needed this workaround so maybe this heuristic will break for other repos that don't conform to the Debian repo spec.

jjmaestro commented 2 months ago

See previous discussions in #86