bazelbuild / rules_pkg

Bazel rules for creating packages of many types (zip, tar, deb, rpm, ...)
Apache License 2.0
226 stars 175 forks source link

Can we make UTF-8 handling/tests better on macOS & Windows? #491

Open aiuto opened 2 years ago

aiuto commented 2 years ago

Currently we disable //tests/mappings:utf8_manifest_test on windows. It is enabled for macOS, but fails for me when building on a case sensitive file system. The problem for macOS is related to handling of some characters in paths. macOS uses an alternate encoding, so our expected paths vary from what we get. You can see the difference in our normalized encoding of "sübdir/2-λ". On macos, instead of getting a single glyph, you get an encoding of 'u' followed by a "put an umlat on the previous character.

$ cat -v tests/mappings/utf8_manifest.golden
[
[0,"1-a","tests/testdata/utf8/1-a","0644",null,null],
[0,"2-λ","tests/testdata/utf8/2-λ","0644",null,null],
[0,"3-�M-^V","tests/testdata/utf8/3-�M-^V","0644",null,null],
[0,"BUILD","tests/testdata/utf8/BUILD","0644",null,null],
[0,"sübdir/2-λ","tests/testdata/utf8/sübdir/2-λ","0644",null,null],
[0,"sübdir/hello","tests/testdata/utf8/sübdir/hello","0644",null,null]
]
$ cat -v bazel-bin/tests/mappings/utf8.manifest
[
[0,"1-a","tests/testdata/utf8/1-a","0644",null,null],
[0,"2-λ","tests/testdata/utf8/2-λ","0644",null,null],
[0,"3-�M-^V","tests/testdata/utf8/3-�M-^V","0644",null,null],
[0,"BUILD","tests/testdata/utf8/BUILD","0644",null,null],
[0,"su�M-^Hbdir/2-λ","tests/testdata/utf8/su�M-^Hbdir/2-λ","0644",null,null],
[0,"su�M-^Hbdir/hello","tests/testdata/utf8/su�M-^Hbdir/hello","0644",null,null]
]

This may need to be fixed in Bazel rather than here, I am just gathering data.

aiuto commented 1 year ago

Perhaps, in the writers, we should take the destination path and run it throught unicodedata.normalize('NFD', ) to get the more common representation to write into the tar or zip file. It's a pity we can't do that in starlark.