Closed ahippler closed 5 years ago
No, not that I'm aware of. IIUC, Buildfarm ships with an empty docker image by default, and our internal implementation is also bring-your-own-docker image. No guarantees on what's in there, and I don't see why a C++-only project (for example) would add a JRE to their remote execution docker images.
Do you know what is in that empty docker image? Is Bash part of it on Linux/macOS? Is PowerShell in it on Windows?
The default buildfarm image is based on gcr.io/distroless/java
, which does include a JRE. Notably it doesn't include /bin/bash
so it can't execute genrules. We (Stripe) use a different base image that includes Bash and Coreutils, and I expect all users of fully-remote builds will have similar custom images.
Right. I expect that requiring bash is fine for Linux and MacOS images. I would prefer not to require any higher-level language (like Java or Python, or even C/C++) given that there will be projects who don't want that as an additional dependency for their images. I expect that PowerShell is fine for Windows as the minimal requirement. The downside is that we end up having to maintain two implementations.
@jmillikin-stripe do you currently have Java in your remote build images? If so, why?
We have Java in our remote build images because https://github.com/bazelbuild/bazel-buildfarm is implemented in Java, and needs the JRE to run its _deploy.jar
binaries.
See the bazel buildfarm worker.container
container image target -- anyone using this standard upstream image as a basis will have Java available.
I thought about this more. Bazel requires at least Windows 7 / Server 2008 R2 as the host platform, so we could require that as a minimum execution platform too.
Compiled C++ binaries are portable across Windows versions (at least within the same architecture), though linking against MSVCRTxx.DLL requires the right DLLs to be installed on the execution machine.
AFAIK Windows Server versions (at least the Core ones) don't come with these DLLs preinstalled, but there's another option -- Win 7 / Server 2008 R2 both have .NET 3.5 preinstalled.
I don't know how the compatibility matrix looks like for .NET applications built for one version of the framework running against another version, but it seems that a C# binary built for .NET 4.0 or 4.5 can happily run on Win Server 2008 R2 (which should only have .NET 3.5). At the some time a binary linked against .NET 3.5 didn't run on my Win Server 2016 VM without wanting to install the right framework first.
Either way, .NET seems like an option we should explore further: it doesn't require any extra DLLs, we can use any .NET language (including Java, just compile it as J#), precompile test-setup
in it, and bundle it with the Bazel binary.
WDYT?
(including Java, just compile it as J#)
I'm old and living under an old rock. Apparently J# is no longer a thing. (https://en.wikipedia.org/wiki/J_Sharp)
sorry for the late reply @laszlocsomor JRE is not always available on remote machines. For rules that need to use it, we recommend they do via toolchain rules - not via PATH or via a hardcoded path (https://docs.bazel.build/versions/master/remote-execution-rules.html#invoking-build-tools-through-toolchain-rules).
My 2c on the wider conversation:
for SingleJar and iJar in remote execution, we have to compile them on the remote execution environment. This effectively means that, currently, to use any rules that require SingleJar or iJar, your remote container needs to have a C compiler installed. So unfortunately, a remote execution container to build e.g., only Java code, needs a C compiler.
For the record, for remote execution we highly dis-recommend using pre-compiled binaries (https://docs.bazel.build/versions/master/remote-execution-rules.html#managing-platform-dependent-binaries) even for languages that you would typically think that binaries are compatible. e.g., we have found certain Linux binary versions of the jar tool can only be used on certain versions of Linux.
All remote execution containers are expected to have bash
@nlopezgi : thanks for the info!
FYI, meanwhile I finished the draft of my design doc on Bash-less (and Perl-less) test execution on Windows, see https://github.com/bazelbuild/proposals/pull/16. In that doc I argue for using a precompiled Windows binary, because Bazel only runs on x86_64 Windows and AFAIK remote execution also only runs x86_64 Windows, so binaries are portable.
Are you suggesting I could safely choose C++ as the implementation language for Linux too, and ship the source code with Bazel, and a C++ compiler is always available on the remote execution machine to build it? If so, is that going to stay that way for the foreseeable future (say, 1 year)?
I think the decision to use pre-compiled binary incurs in some technical debt that someone might end up having to pay down the line (e.g., whenever we decide to allow something else other than x86_64 Windows), but I understand there might not be other feasible solutions.
wrt using C++, we do have a restriction of currently needing C++ compiler for any builds that require use of java rules. I'm not sure we want to extend that to also be a requirement for all test rules. I do think that some language tools will always be needed to compile so called 'embedded tools' (which is a non-issue for local execution). These language tools thus will necessarily be required in remote execution containers, and a C++ compiler is a good choice as we offer a very stable and up to date one for use with remote execution containers (for Linux only, though).
Thanks.
Re: technical debt: I hear you, but your point sounds rather speculative, so I'd not worry about non-x64 Windows platforms for the foreseeable future (say 1 year).
Re: compiler: since you don't want to require it for tests, which I also agree with, I think C++ is not the right language for the test wrapper on Linux. (It still seems like the right choice on Windows.)
But then I don't know what's the right choice for Linux, if only Bash is promised (without Perl). Maybe a clever sed
program could help?
@nlopezgi , another question: do you know how Bazel cancels a remotely running test action in case the user presses Ctrl+C? Does Bazel dispatch this to the remote service or does it just close the connection?
re: compiler: I did not state I did not want to require it for tests, just that I was not sure its the right choice and don't want to be the one to make it w/o at least conferring with some other folks (I'll get back to you once I've confirmed). imo, if some tool is to be required for all tests, i'd rather it is the c++ compiler (instead of perl or python), but not sure what the trade-offs (effort/maintenance) between c++ vs a clever sed program would be.
re: canceling a test: not sure, you'd want to ask @ola-rozenfeld about api details
I was too optimistic with the "clever sed program". The task is to UTF-8-decode an octet-stream, test decoded characters if they fall into any of some disjoint ranges and replace them with "?", then UTF-8-encode the result again.
I'm not aware of an efficient way to do this without encoding and decoding.
You don't need to decode and re-encode - you can simply test on the utf-8 representation using seds regexp support.
How? Does sed
support matching UTF-8 strings?
sed supports matching binary, and you know the utf-8 encoding, so you can check for specific utf-8 ranges, like so:
echo "ä ö ü" | LANG=C sed -e "s/[\xc0-\xdf][\x00-\xff]/?/g"
Here, I'm replacing all two-byte utf-8 sequences with a single '?' character. See https://en.wikipedia.org/wiki/UTF-8 for the multi-byte ranges.
After a lot of trial and error, I've come up with a sed script that - I think - does what we want:
cat test.log | LANG=C sed -E \
-e 's/.*/& /g' \
-e 's/(([\x9\xa\xd\x20-\x7f]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xec][\x80-\xbf][\x80-\xbf]|[\xed][\x80-\x9f][\x80-\xbf]|[\xee-\xef][\x80-\xbf][\x80-\xbf]|[\xf0][\x80-\x8f][\x80-\xbf][\x80-\xbf])*)./\1?/g' \
-e 's/(.*)\?/\1/g'
First, add a single white space character (' ') to the end of each line. Second, replace all (possibly empty) sequences of legal characters followed by a character with the sequence of legal characters followed by a question mark character ('?'). Third, remove the trailing question mark character ('?') from each line.
Hats off, that's quite impressive.
I think you made a couple mistakes:
[\xe0-\xed][\x80-\x9f][\x80-\xbf]
, whereas yours is split up into [\xe0-\xec][\x80-\xbf][\x80-\xbf]
and [\xed][\x80-\x9f][\x80-\xbf]
[\xf0-\xf4][\x80-\x8f][\x80-\xbf][\x80-\xbf]
, but yours is [\xf0][\x80-\x8f][\x80-\xbf][\x80-\xbf]
Could you compare your results with mine?
Sorry, I got confused, gimme a minute to correct this.
you need to cover the entire 2-octet UTF-8 domain (U+80 .. U+7FF)
This is wrong. I meant to say: you need to cover the U+80..U+7FF (two UTF-8 octets) and U+800..U+D7FF (three UTF-8 octets) ranges. The 2-octet domain is covered correctly ([\xc0-\xdf][\x80-\xbf]
), but I think the 3-octet-matching regex is wrong in your solution.
Ok, how about this:
[\x9\xa\xd\x20-\x7f] <--- (9,A,D,20-7F)
[\xc0-\xdf][\x80-\xbf] <--- (0080-07FF)
[\xe0-\xec][\x80-\xbf][\x80-\xbf] <--- (0800-CFFF)
[\xed][\x80-\x9f][\x80-\xbf] <--- (D000-D7FF)
[\xf0-\xf7][\x80-\xbf][\x80-\xbf][\x80-\xbf] <--- (010000-10FFFF)
Wait, there's still one range missing. Gah!
Ah I see where I was wrong, you are right to match 0800-CFFF and match D000-D7FF separately.
Another try:
[\x9\xa\xd\x20-\x7f] <--- (9,A,D,20-7F)
[\xc0-\xdf][\x80-\xbf] <--- (0080-07FF)
[\xe0-\xec][\x80-\xbf][\x80-\xbf] <--- (0800-CFFF)
[\xed][\x80-\x9f][\x80-\xbf] <--- (D000-D7FF)
[\xee][\x80-\xbf][\x80-\xbf] <--- (E000-EFFF)
[\xef][\x80-\xbe][\x80-\xbf] <--- (F000-FFEF)
[\xef][\xbf][\x80-\xbd] <--- (FFF0-FFFD)
[\xf0-\xf7][\x80-\xbf][\x80-\xbf][\x80-\xbf] <--- (010000-10FFFF)
Ok, I wrote a small Java program to double-check the pattern, and it returned the expected ranges:
9-a,d-d,20-d7ff,e000-fffd,10000-10ffff
I'm glad we are free now.
Ok, I have a patch which conflicts with my other changes to test-setup.sh. Both are a bit risky, and we need to pick one to be merged first.
If you have changes lined up, merge those first. This bug has been open for a long time, it's OK to wait another day.
Patch is here: https://bazel-review.googlesource.com/c/bazel/+/68711
There are still a couple possible issues with this that we need to look into.
My primary concern is what should happen if the default charset of the current machine is NOT UTF-8. The previous Perl solution was broken as well: it did a charset conversion from the default charset to UTF-8 on input, but also from UTF-8 to the default charset on output, which can actually re-introduce illegal characters (we might want to file a bug for that, or note it on the existing bug), breaking the resulting XML.
The new code intentionally does not perform any charset conversion.
Ideally, we'd do a charset conversion from the default charset to UTF-8 before running the sed script. Note that I override the LOCALE before running sed, which probably sets the default charset to ISO-8859-1 (this needs to be double-checked!).
(I'm on vacation for the rest of August, and won't be able to work on this until I'm back.)
Thanks!
The previous Perl solution was broken as well: it did a charset conversion from the default charset to UTF-8 on input, but also from UTF-8 to the default charset on output, which can actually re-introduce illegal characters (we might want to file a bug for that, or note it on the existing bug), breaking the resulting XML.
Please elaborate. How and where exactly are the conversions done?
(I'm on vacation for the rest of August, and won't be able to work on this until I'm back.)
Do you intend to finish this task yourself or to appoint it to someone else (and if so, whom)?
If it's still open when I'm back, I'll do it. If someone comes in and finishes my changes, I'm happy, too.
Perl does implicit charset conversion on every read and write from a file or stdin/stdout. I believe it converts from the platform default charset to UTF-8 internally. That's my reading of the docs, anyway.
I have a pending patch.
Would be awesome to see that line go. It’s the last remaining use of perl
in bazel from what I can see (I already removed the other use in https://github.com/bazelbuild/bazel/pull/5999).
Go @ulfjack! :cake: :running_man:
Description of the problem / feature request:
cc_test uses a inline perl script for failed tests. https://github.com/bazelbuild/bazel/blob/eb067ea88749a5635cc8ee8954cde2b767f1eb61/tools/test/test-setup.sh#L153
Feature requests: what underlying problem are you trying to solve with this feature?
The usage of perl is not documented. Windows does not have Perl installed by default.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
What operating system are you running Bazel on?
Windows 10
What's the output of
bazel info release
?0.10.1
The perl script replaces invalid XML characters and invalid sequence in CDATA. To get rid of perl bash or python could be used.