bazel-contrib / bazel-lib

Common useful functions for writing BUILD files and Starlark macros/rules
Apache License 2.0
131 stars 87 forks source link

[Bug]: `mtree_spec` rule does not encode filenames with special characters #794

Open pcj opened 6 months ago

pcj commented 6 months ago

What happened?

The mtree_spec rule generates an invalid mtree entries for filenames with special characters. For example:

server.runfiles/pypi/_lock/setuptools@69.1.1/site-packages/setuptools/command/launcher manifest.xml uid=0 gid=0 time=1672560000 mode=0755 type=file content=bazel-out/darwin_arm64-fastbuild-ST-5b66dcbb8b68/bin/external/pypi/_lock/setuptools@69.1.1/site-packages/setuptools/command/launcher manifest.xml

Causes:

bsdtar: Error reading archive bazel-out/darwin_arm64-fastbuild/bin/server/server_image.packages_tar_manifest.spec: Can't open bazel-out/darwin_arm64-fastbuild-ST-5b66dcbb8b68/bin/external/pypi/_lock/setuptools@69.1.1/site-packages/setuptools/command/launcher

Filenames should be encoded according to vis (see man vis)

In this case it should be encoded as:

server.runfiles/pypi/_lock/setuptools@69.1.1/site-packages/setuptools/command/launcher\040manifest.xml uid=0 gid=0 time=1672560000 mode=0755 type=file content=bazel-out/darwin_arm64-fastbuild-ST-5b66dcbb8b68/bin/external/pypi/_lock/setuptools@69.1.1/site-packages/setuptools/command/launcher\040manifest.xml

References:

Version

http_archive(
  name = "aspect_bazel_lib",
  generator_name = "aspect_bazel_lib",
  generator_function = "fetch_dependencies",
  urls = ["https://github.com/aspect-build/bazel-lib/archive/40948111707ff0f491d14ea4bd016b78498d14cf.tar.gz"],
  sha256 = "b69d39dd191bcd6f4ac6d62b0124ea460caf530bb18ea678c9778fe4046c21f9",
  strip_prefix = "bazel-lib-40948111707ff0f491d14ea4bd016b78498d14cf",
)

How to reproduce

No response

Any other information?

No response

pcj commented 6 months ago

Hmm I started a PR on this and am realizing that the use of bsdtar and mtree with bazel+starlark is complicated.

The mtree specification does not allow unicode characters for filenames, and I don't see a way in java starlark to get the codepoint of a string character (the ord function is missing). Starlark therefore is not the right tool for producing an vis-encoded mtree text file (you can't create the necessary escape sequence for a unicode-encoded file like 😍.txt).

The mtree_spec rule could use a dedicated tool (like go-mtree, or bsdtar itself) but then you have a runtime dependency on rules_go, or add that to your build and release workflows.

In any case, that's a much bigger PR. I think I will have to move away from using this tar rule. Unless I'm missing something, it can't be used for anything other than toy examples without a bit of a redesign.

alexeagle commented 6 months ago

We already had to add ord for a base64 implementation in starlark https://github.com/aspect-build/bazel-lib/blob/0fc838839c41677b18245152ad50759233c53794/lib/private/strings.bzl#L521

So I think that path of fixing the mtree we write following the spec is viable

alexeagle commented 6 months ago

add that to your build and release workflows

If we did go that route - we already have release automation to build four other Go utilities and expose them as pre-built binary toolchains for users, so that's totally viable as well

pcj commented 6 months ago

We already had to add ord for a base64 implementation in starlark

Hmm I looked at it and this implementation only works for ascii, would not work for 😍. Need to be able to do something similar to:

echo 😍 | vis -o
\237\230\215

I don't think this is possible in starlark... you can't iterate the individual bytes of a unicode string.

pcj commented 6 months ago

Interesting, I didn't know git uses something like this too?

$ find find e2e/smoke/testdata
e2e/smoke/testdata/file empty with spaces.txt
e2e/smoke/testdata/file empty with 😍.txt
e2e/smoke/testdata/file-empty.txt

Prepping to commit these files:

$ git status

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    .vscode/
    e2e/smoke/.vscode/
    "e2e/smoke/testdata/file empty with \360\237\230\215.txt"
pcj commented 6 months ago

@alexeagle I pushed #795 if you want to continue working on it, or feel free to close the PR and start a new one, I don't need the commit attribution.

betaboon commented 4 months ago

hello. Until we found a good solution in #795 to vendor vis for a proper solution I'm using this patch against bazel-lib to address the whitespace case:

diff --git a/lib/private/tar.bzl b/lib/private/tar.bzl
index 0bc8a2e..db064c1 100644
--- a/lib/private/tar.bzl
+++ b/lib/private/tar.bzl
@@ -161,16 +161,17 @@ def _tar_impl(ctx):
     return DefaultInfo(files = depset([out]), runfiles = ctx.runfiles([out]))

 def _mtree_line(file, type, content = None, uid = "0", gid = "0", time = "1672560000", mode = "0755"):
     spec = [
-        file,
+        file.replace(" ", "\\040"),
         "uid=" + uid,
         "gid=" + gid,
         "time=" + time,
         "mode=" + mode,
         "type=" + type,
     ]
     if content:
+        content = content.replace(" ", "\\040")
         spec.append("content=" + content)
     return " ".join(spec)

 # This function exactly same as the one from "@aspect_bazel_lib//lib:paths.bzl"

via:

    http_archive(
        name = "aspect_bazel_lib",
        sha256 = "a8a92645e7298bbf538aa880131c6adb4cf6239bbd27230f077a00414d58e4ce",
        strip_prefix = "bazel-lib-2.7.2",
        url = "https://github.com/aspect-build/bazel-lib/releases/download/v2.7.2/bazel-lib-v2.7.2.tar.gz",
        patches = [
            # workaround for https://github.com/aspect-build/bazel-lib/issues/794
            "//:aspect_bazel_lib@2.7.2-mtree_spec.patch",
        ],
        patch_args = ["-p1"],
    )