Open pcj opened 6 months ago
Hmm I started a PR on this and am realizing that the use of bsdtar
and mtree
with bazel+starlark is complicated.
The mtree
specification does not allow unicode characters for filenames, and I don't see a way in java starlark to get the codepoint of a string character (the ord
function is missing). Starlark therefore is not the right tool for producing an vis-encoded mtree
text file (you can't create the necessary escape sequence for a unicode-encoded file like 😍.txt
).
The mtree_spec
rule could use a dedicated tool (like go-mtree
, or bsdtar itself) but then you have a runtime dependency on rules_go, or add that to your build and release workflows.
In any case, that's a much bigger PR. I think I will have to move away from using this tar
rule. Unless I'm missing something, it can't be used for anything other than toy examples without a bit of a redesign.
We already had to add ord
for a base64 implementation in starlark
https://github.com/aspect-build/bazel-lib/blob/0fc838839c41677b18245152ad50759233c53794/lib/private/strings.bzl#L521
So I think that path of fixing the mtree we write following the spec is viable
add that to your build and release workflows
If we did go that route - we already have release automation to build four other Go utilities and expose them as pre-built binary toolchains for users, so that's totally viable as well
We already had to add
ord
for a base64 implementation in starlark
Hmm I looked at it and this implementation only works for ascii, would not work for 😍
. Need to be able to do something similar to:
echo 😍 | vis -o
\237\230\215
I don't think this is possible in starlark... you can't iterate the individual bytes of a unicode string.
Interesting, I didn't know git uses something like this too?
$ find find e2e/smoke/testdata
e2e/smoke/testdata/file empty with spaces.txt
e2e/smoke/testdata/file empty with 😍.txt
e2e/smoke/testdata/file-empty.txt
Prepping to commit these files:
$ git status
Untracked files:
(use "git add <file>..." to include in what will be committed)
.vscode/
e2e/smoke/.vscode/
"e2e/smoke/testdata/file empty with \360\237\230\215.txt"
@alexeagle I pushed #795 if you want to continue working on it, or feel free to close the PR and start a new one, I don't need the commit attribution.
hello.
Until we found a good solution in #795 to vendor vis
for a proper solution I'm using this patch against bazel-lib
to address the whitespace case:
diff --git a/lib/private/tar.bzl b/lib/private/tar.bzl
index 0bc8a2e..db064c1 100644
--- a/lib/private/tar.bzl
+++ b/lib/private/tar.bzl
@@ -161,16 +161,17 @@ def _tar_impl(ctx):
return DefaultInfo(files = depset([out]), runfiles = ctx.runfiles([out]))
def _mtree_line(file, type, content = None, uid = "0", gid = "0", time = "1672560000", mode = "0755"):
spec = [
- file,
+ file.replace(" ", "\\040"),
"uid=" + uid,
"gid=" + gid,
"time=" + time,
"mode=" + mode,
"type=" + type,
]
if content:
+ content = content.replace(" ", "\\040")
spec.append("content=" + content)
return " ".join(spec)
# This function exactly same as the one from "@aspect_bazel_lib//lib:paths.bzl"
via:
http_archive(
name = "aspect_bazel_lib",
sha256 = "a8a92645e7298bbf538aa880131c6adb4cf6239bbd27230f077a00414d58e4ce",
strip_prefix = "bazel-lib-2.7.2",
url = "https://github.com/aspect-build/bazel-lib/releases/download/v2.7.2/bazel-lib-v2.7.2.tar.gz",
patches = [
# workaround for https://github.com/aspect-build/bazel-lib/issues/794
"//:aspect_bazel_lib@2.7.2-mtree_spec.patch",
],
patch_args = ["-p1"],
)
What happened?
The
mtree_spec
rule generates an invalid mtree entries for filenames with special characters. For example:Causes:
Filenames should be encoded according to
vis
(seeman vis
)In this case it should be encoded as:
References:
Version
How to reproduce
No response
Any other information?
No response