bazelbuild / rules_pkg

Bazel rules for creating packages of many types (zip, tar, deb, rpm, ...)
Apache License 2.0
212 stars 166 forks source link

Support duplicate file paths in tar archives #850

Closed eejayes closed 1 month ago

eejayes commented 2 months ago

Duplicate path entries are made possible within tar archives, as per feature request described in #849.

RELNOTES: Duplicate path entry support within tar archives

google-cla[bot] commented 2 months ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

eejayes commented 2 months ago

I will await design feedback before updating tests.

aiuto commented 2 months ago

Let's work out behavior in the issue before continuing with this.

aiuto commented 2 months ago

I think the discussion in #849 came up with a good proposal. Whenever you're ready.

eejayes commented 2 months ago

Hello. Yes, I would agree. I have started to work on it.

On Wed, Apr 24, 2024 at 9:57 PM aiuto @.***> wrote:

I think the discussion in #849 https://github.com/bazelbuild/rules_pkg/issues/849 came up with a good proposal. Whenever you're ready.

— Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_pkg/pull/850#issuecomment-2075735262, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRWVZTPR3JDSZA6HHRZ5GTY7AFELAVCNFSM6AAAAABGCIY3GKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZVG4ZTKMRWGI . You are receiving this because you were assigned.Message ID: @.***>

eejayes commented 2 months ago

The commit above provides pkg_tar with support for duplicate file paths. Tests added show that:

In the pre-existing code, deps are added after srcs. Thus for paths appearing both in deps and srcs, it is the one in deps which will appear later in the archive list and will thus prevail when an archive is unpacked. This conflicts with common command line tar tools where:

  1. users may specify which archive is subject to modification during an archive concatenation operation
  2. the archive is implicitly the modification subject in an operation combining a file and archive

It would be straight-forward to adapt the code to the second case. The first case would however require a mechanism to specify which archive (i.e. dep) is the modification subject, i.e. a sequence of steps of an archive's construction. The proposed iteration does not solve these. Instead it requires that multiple sequential pkg_tar operations be declared when override behavior for path conflicts between any pair of deps or srcs is desired.

The purpose of duplicate support is to allow evolution of artifacts with implicit override behavior consistent with typical tar utilities. Since files can never be artifacts and instead serve to specify incremental change, srcs conflicts which could exist within the scope of a single evolutionary step as defined by a pkg_tar instance should be solved prior to it's declaration. Thus adding duplicates to srcs should be an error.

Outside of the discussion in issue #849, there were some unforeseen interactions with create_parents. Since duplicates are not allowed in the current state of the code, a duplicate of a pre-existing directory will not be created if create_parents is specified. However with the support of allow_duplicates_from_deps, an additional directory with the arbitrary default permissions could conceivably be created. Default permissions represent a guess made in the absence of better information. Since pre-existing directory permissions represent information that is either equally good if originating from a create_parents operation, or better in the case that a user has declared them, a duplicate should not result from a create_parents operation even if allow_duplicates_from_deps is set.

Future improvement considerations are:

  1. Reversal of file and tar addition ordering to address (2) above
  2. Sequencing mechanism for tar construction (1) above
  3. Adding duplicates file paths to srcs is an error.
aiuto commented 2 months ago

I can look at this later today.