bazelbuild / rules_pkg

Bazel rules for creating packages of many types (zip, tar, deb, rpm, ...)
Apache License 2.0
216 stars 171 forks source link

No option to preserve symlinks in pkg_tar #115

Open colatkinson opened 4 years ago

colatkinson commented 4 years ago

The current behavior of pkg_tar is that if I have a file and a symlink to that file, and both are given as sources, the symlink is "expanded" and two copies of the original file are added to the repo.

For some context, my use case is that I'm trying to package an external toolchain's libc into a tar for use in Docker containers (cross-compiling is... fun). As you can imagine, this is massively ballooning the size of my images.

While I could certainly attempt to do something like filtering all the links out and then generating a symlinks dict to pass in, I was wondering if there would be interest in adding a flag to allow this behavior in the main rule. I think it could be useful in a variety of contexts, especially in generating library debs for use by others.

The behavior as I imagine it would be along the lines of passing a preserve_symlinks flag, which would then include symlinks if they only point to files in the archive. This validation should be simple enough, and would remove risks of non-hermeticity. Alternatively, symlinks pointing elsewhere could follow the current behavior, while hermetic symlinks could be preserved.

Please let me know if there are any concerns I'm missing, or if there's an obvious solution to my problem that I missed somewhere in the docs.

aiuto commented 4 years ago

Clearly a missing feature. I would like to see this in pkgfilegroup and reused across all rules.

pudelkoM commented 4 years ago

In case someone needs this and is fine with a workaround until this is properly fixed here, we wrote a Starlark wrapper that invokes pkg_tar with an appropriate flagfile to create the symlinks:

https://github.com/stratum/stratum/blob/3e8f0cdf1abdbdf5458b802058c8413084884420/bazel/rules/package_rule.bzl#L12-L47

aiuto commented 3 years ago

This should be doable now with pkg_mklink, but it won't do the auto-magic expand. I think that feature is not that useful. In practice, what you usually want in your package is something like

Bazel doesn't really deal with symlinks as a differentiable first-class file type that well, so it would be excessive engineering here to do that unless that was added.

bazaglia commented 2 years ago

@aiuto I'm using aspect-build/rules_js, a new high performance alternative/better approach to build_bazel_rules_nodejs and it uses pnpm that heavily depends on symlinks. Unfortunately in Node.js it's not about building just a binary as you mentioned, there is a node_modules folder structure where the symlinks need to be preserved otherwise the modules resolution won't work. I think that feature is pretty useful.

aiuto commented 2 years ago

Can you provide a detailed writeup (preferablye in an external doc) about what rules_js does, and then what you need to accomplish. I won't have the time to do this research myself so I have to rely on support from the community to gather requirements for inputs we need to accept and final package layouts we want to achieve. Once we get a good set of use cases, I'm sure I can come up with an implementation plan.

alexeagle commented 2 years ago

https://pnpm.io/symlinked-node-modules-structure is the detailed writeup about the semantics required. The tar file needs to contain a node_modules directory with symlinks into a CAS "virtual store" as described there.

So we just need exactly what the OP requests: preserve the symlinks in the tar file so that they survive the trip into a docker image and then into the container runtime.

alexeagle commented 2 years ago

@aiuto this is now a significant blocker for rules_nodejs users ready to migrate to rules_js. Was that reply sufficient for you to accept a PR to add the preserve_symlinks option?

kurt-google commented 2 years ago

Just to chime in, I would also appreciate this working in pkg_tar. The use case is similar, I want to preserve file structures for/from toolchains. In my particular case the toolchain produces many aliases of large libraries as symlinks (think .so .so.1 .so.1.0.0 etc). And when there are 10 aliases of a 100MB binary it is quite impactful.

aiuto commented 2 years ago

I'm ooo until July 7. Will pick up thead then.

On Thu, Jun 23, 2022, 12:20 AM Kurt K @.***> wrote:

Just to chime in, I would also appreciate this working in pkg_tar. The use case is similar, I want to preserve file structures for/from toolchains. In my particular case the toolchain produces many aliases of large libraries as symlinks (think .so .so.1 .so.1.0.0 etc). And when there are 10 aliases of a 100MB binary it is quite impactful.

— Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_pkg/issues/115#issuecomment-1163754137, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXHHHEOJODAIRYHJW4RQQTVQONUBANCNFSM4JLM6PRA . You are receiving this because you were mentioned.Message ID: @.***>

iamricard commented 2 years ago

👋🏼 @aiuto wondering if you had a chance to pick up the thread. we're looking into moving to rules_js, and this is the one thing (that we know) is blocking us. thanks!

aiuto commented 2 years ago

OK. I am in a mood to deal with this now.

I presume we can have dangling, arbitrary symlinks, so that is not the problem. My hunch is that you can make the structure you want with pkg_files, but then it goes bad (where bad means messing up relative symlinks) in one of several possible places.

While we are at it..... the spec uses <store> a lot. Is there a need for something like parameterized output path rewrite?

aiuto commented 2 years ago

Is #603 the right kind of test case?

aiuto commented 2 years ago

I need help understanding the requirement here. Please look at #603. It

It still comes out right

tar tvf bazel-bin/tests/tar/relative_links_re_tar.tar
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/bar@1.0.0/
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/bar@1.0.0/node_modules/
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/bar@1.0.0/node_modules/bar -> STORE/bar
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/bar@1.0.0/node_modules/qar -> ../../qar@2.0.0/node_modules/qar
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/foo@1.0.0/
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/foo@1.0.0/node_modules/
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/foo@1.0.0/node_modules/bar -> ../../bar@1.0.0/node_modules/bar
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/foo@1.0.0/node_modules/foo -> STORE/foo
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/foo@1.0.0/node_modules/qar -> ../../qar@2.0.0/node_modules/qar
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/qar@2.0.0/
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/qar@2.0.0/node_modules/
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/.pnpm/qar@2.0.0/node_modules/qar -> STORE/qar
lr-xr-xr-x 0/0               0 1999-12-31 19:00 node_modules/foo -> .pnpm/foo@1.0.0/node_modules/foo

What I think I am missing is how you build your input sources.

aiuto commented 2 years ago

Or..... maybe I am missing the obvious. Is the requirement:

Real check in files, or the result of a tree artifact.

pkg_tar(srcs=glob("[**]*), ...)

And that does not do the right thing for the two links?

... and the answer is yes. The current version of #603 flattens the symlinks to the content they point to, so I have a valid test case now. The question is how I want the API to look. What I really do not want is to implement in pkg_tar and then realize I needed it in pkg_files anyway.

nacl commented 2 years ago

I was wondering something similar; it really depends on what rules_js is doing. The need to package everything up as a tar is also unclear to me.

In any case, it may not be possible (or convenient) to emit all the symlink providers.

thesayyn commented 2 years ago

@aiuto findings are true. the underlying problem is that write_manifest constructs misleading manifests and build_tar blindly tries to build up the tar and fails.

image a rule like this

def incorrect_impl(ctx):
  actual_file = ctx.actions.declare_file("thefile.txt")
  symlink = ctx.actions.declare_file("this_is_a_symlink.txt")
  ctx.actions.symlink(symlink, target_file = actual_file)
  return DefaultInfo(files = depset([actual_file, symlink]))

pkg_tar will add two manifest entries;

but I don't think the problem is supposed to be fixed here in rules_pkg since there is no way pkg_tar could tell whether a file is a symlink or not.

the principled fix would be that bazel exposes two new fields to FileApi named is_symlink and symlink_target which points to another File

thesayyn commented 2 years ago

also, pkg_mklink is not helpful as there is a huge symlink forest for node_modules.

aiuto commented 2 years ago

the principled fix would be that bazel exposes two new fields to FileApi named is_symlink and symlink_target which points to another File

Yes. That really should happen. But that would be Bazel 7.x at this point, so I do want to come up with things I can do in the interim.

thesayyn commented 2 years ago

I have already started on a fix that checks if any entry#[2] within the manifest is a symlink, if so then does a realpath on it then strips out execroot/wksp (to make it bazel-out/cfg/path/to) portion from it then does a lookup within the manifest to find the corresponding file. if found then rewrites the entry to be a symlink that points to the target file with its path in final tar.

I'd like to put up a PR for this to fix it in rules_pkg.

thesayyn commented 2 years ago
  • we can probably detect that input files are links or not, but that might not be compatible with remote build.

https://github.com/bazelbuild/bazel/pull/15866 is working towards fixing symlinks with RBE

aiuto commented 2 years ago

@thesayyn Can you take a look at #603 to confirm that //tests/tar:relative_links_tar reproduces the problem here.

If you were to patch it and do:

blaze build //tests/tar:relative_links_tar
tar tvf  bazel-bin/tests/tar/relative_links_tar.tar

you would see

$ tar tvf  bazel-bin/tests/tar/relative_links_tar.tar
-r-xr-xr-x 0/0            3993 1999-12-31 19:00 BUILD
drwxr-xr-x 0/0               0 1999-12-31 19:00 node_modules/
...
-r-xr-xr-x 0/0            3993 1999-12-31 19:00 outer_BUILD

This is correct for some people, and wrong for others. Since the native file is a symlink, they may have wanted outer_BUILD to simply be ../BUILD.

You say you are working on a "fix". I don't think a single fix is what we need. There has to be a way to control the behavior. And, I think it has to be in pkg_files rather than pkg_tar and pkg_zip individually. In some places we want to preserve links, but we might mix that with places where we follow them. Do you have a WIP PR?

aiuto commented 2 years ago

More thinking about this.

aiuto commented 2 years ago

@thesayyn @alexeagle I don't want to wait for https://github.com/bazelbuild/bazel/pull/15866 and bazel 6.x to make progress on this.

If you think the test cases in #603 cover enough of the space here, I will merge that PR as a baseline to work against and then continue a PR for this using those as a test case.

iamricard commented 2 years ago

sorry for not replying here after the ping! i honestly don't know enough to answer some of the questions, but maybe @alexeagle might?

alexeagle commented 2 years ago

@thesayyn is the most up-to-date. My impression since we landed https://github.com/aspect-build/rules_js/tree/main/e2e/js_image is that this issue is lower-severity.

aiuto commented 2 years ago

OK. I can still do some fixing on this now while I am trying to prep a 0.0.8 release. If #603 has good test cases, I can use that as a basis for a PR that does something useful.

joshwiden commented 2 years ago

Hi all! I wanted to chime in to say that this issue also affects my team and we'd love to see a fix. I just don't want to see the momentum get lost after the recent comment saying that this issue may be lower-severity!

Our usage essentially boils down to:

pkg_files(
    name = "install_files",
    srcs = glob(
        include = [
            # some things with symlinks, such as:
            #   lib.so -> lib.so.6
            #   lib.so.6 -> lib.so.6.2.3
            #   lib.so.6.2.3
        ],
    ),
)

pkg_tar(
    name = "config",
    srcs = [
        ":install_files",
    ],
)

Thanks!

aiuto commented 2 years ago

The issue is still important to me, but I do need help in the form I have asked for. I need people to tell me if the test cases I have created in #603 (?) really do capture the use case. If I have that right, making the change is an easy part. I'm not going to add features without a comprehensive set of tests that capture real requirements.

On Wed, Aug 10, 2022 at 12:40 AM Josh Widen @.***> wrote:

Hi all! I wanted to chime in to say that this issue also affects my team and we'd love to see a fix. I just don't want to see the momentum get lost after the recent comment saying that this issue may be lower-severity!

Our usage essentially boils down to:

pkg_files( name = "install_files", srcs = glob( include = [

some things with symlinks, such as:

        #   lib.so -> lib.so.6
        #   lib.so.6 -> lib.so.6.2.3
        #   lib.so.6.2.3
    ],
),

)

pkg_tar( name = "config", srcs = [ ":install_files", ], )

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_pkg/issues/115#issuecomment-1209962938, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXHHHFTVV42K2TOKOL3SKDVYLM5TANCNFSM4JLM6PRA . You are receiving this because you were mentioned.Message ID: @.***>

thesayyn commented 2 years ago

I need people to tell me if the test cases I have created in #603 (?)

Yes. Though this is only one part of the symlinks issue with pkg_tar. I have created https://github.com/bazelbuild/rules_pkg/pull/609 as well to demonstrate the issue with the node_module tree that rules_js lays out.

aiuto commented 1 year ago

OK. I'm unexpectedly tied up this week and can't do deep thinking but I can get people to review the old PRs so we can improve the baseline test cases.

tcm-marcel commented 1 year ago

We hit the same problem with the.so lib symlinks. I quickly tried to patch rules_pkg to preserve symlinks, but it turned out to be harder than expected to differentiate manually created and sandbox-related symlinks. I will give it a second try at one point and let you know once I have a fix.

thisgithappens commented 4 months ago

Is there any update on this @aiuto? Would be nice if these symlinks were preserved. Even passing a pkg_mklink as a src to a pkg_tar or pkg_files doesn't preserve the link.

aiuto commented 4 months ago

I've not taken a look at this since 2022.

thisgithappens commented 4 months ago

What might be involved in getting a fix in for the test cases in #603? Seems like others confirmed those test cases were sufficient. I think this is still a valid feature/outcome of a package.

peakschris commented 3 months ago

This is also proving to be a blocker for us. We need to create docker containers that match our existing ones (from a different build system), which use symlinks. The pkg_tar used in our container builds flattens the links. We'd like links preserved.

The #603 test case matches our need

alexeagle commented 3 months ago

@thisgithappens @peakschris You might want to try https://github.com/aspect-build/bazel-lib/blob/main/docs/tar.md instead. It now has feature parity with strip_prefix and some other attributes of pkg_tar.

peakschris commented 3 months ago

@alexeagle that's an interesting idea, thanks for sharing. I think this would solve the issue for this particular archive, at the cost of bifurcating our archive rule usage.

I have thought several times about using aspect tar for all our archive needs. It looks like it has moved on since last time I looked! It would have the advantage of a more terse style than rules_pkg. However, I think we can't currently use aspect tar for all our archive needs, because:

I'll continue to mull it over,

Thanks, Chris