Open endobson opened 4 years ago
@jmmv
Could you provide some example with specific kinds of inputs you are talking about? After a quick look, I cannot understand the problem you are describing. Thanks.
I linked the code that is for running the workers, https://github.com/endobson/minimal-racket/commit/3ad8f443f517b1ece4a1e7bbd1b8ee2b37b7919f. The worker executable reads some of the inputs before it starts accepting requests. These are the libraries it uses in its implementation, as the language runtime I'm using doesn't produce a single native executable. If these change then the worker should produce different results, but that will only happen if the worker process gets restarted, which it doesn't (as that is the point of persistent workers).
This means that using an existing worker versus using bazel clean
to shut down the workers produce different results.
Maybe here is a better question: Some time when rerunning an action with a persistent worker the worker will need to be restarted due to the worker's code changing. Which inputs correspond to that versus just running the action on the existing persistent worker process?
Ah, I see what you mean now. So you are talking about the dependencies of the worker program itself. How do these change during the build though? I'm surprised that's possible.
They don't change during regular operation. But when first writing the worker program and debugging issues with it, such changes were very common.
Ah, I see!
But... it's pretty much impossible for Bazel to know what the dependencies of an arbitrary binary are, and having a way to manually specify them seems pretty error prone. We could monitor for changes to the binary itself (and we probably should if we don't already), but that doesn't help if dynamic libraries change for example.
Do you have a specific proposal other than the simplistic suggestion above?
I want documentation telling me how this is supposed to work, and fairly big warnings on how this can cause inconsistent builds, mechanisms to detect it/avoid it.
For example would having all the dependencies as part of the runfiles of the target used as 'executable' have solved this? Thats doable for me, and seems like a fairly easy to explain rule.
cc @larsrc-google
Documentation for persistent workers is now available at https://docs.bazel.build/versions/master/persistent-workers.html, though it doesn't address this particular problem.
I don't think adding the dependencies as runfiles would help at the moment. The tools are by and large treated as black boxes - they could be executables created by a third party that doesn't want to give you their source. They are not expected to be Bazel targets. However, if the WorkerKey changes, new workers are created. So if you can somehow update the arguments or environment on recompile, that would get picked up automatically.
Alternatively, when working on the workers themselves, use the --worker_quit_after_build flag.
Reading through this again, it seems to be a bigger issue than just workers. If I understand correctly, your action runs a binary that loads some dynamic library, but this library is not declared as part of the runfiles (or some of its further dependencies aren't declared). But that would mean that your build, regardless of use of workers, is not properly hermetic, since a change that Bazel is not informed of can change what the result is.
All of my tools are intended to be hermetic and I use bazel to generate them. The underlying racket vm/runtime is an external repository in my WORKSPACE and I'm using toolchains to find them for the rules (https://github.com/endobson/minimal-racket/blob/master/WORKSPACE).
What I think you are referring to as the dynamic library is the bazel-tools/racket-compiler
library that I use here: https://github.com/endobson/minimal-racket/blob/3ad8f443f517b1ece4a1e7bbd1b8ee2b37b7919f/racket.bzl#L93
That library is intended to be declared as a dependency of the action here: https://github.com/endobson/minimal-racket/blob/3ad8f443f517b1ece4a1e7bbd1b8ee2b37b7919f/racket.bzl#L93 through the toolchain https://github.com/endobson/minimal-racket/blob/3c95378d2c5379225fd8a233988773d97c9c4fc4/BUILD#L29.
IIRC when originally filing this issue I observed that changes to (https://github.com/endobson/minimal-racket/blob/3ad8f443f517b1ece4a1e7bbd1b8ee2b37b7919f/build_rules/racket-compiler.rkt) would make bazel rebuild my resulting code if I wasn't using workers but would not do the same if I was.
After chatting, we found that the four lines from https://github.com/endobson/minimal-racket/blob/3ad8f443f517b1ece4a1e7bbd1b8ee2b37b7919f/racket.bzl#L108 should go into tools
instead of inputs
. @endobson will try this out.
Haven't yet got to seeing if tools versus inputs will solve my particular problem, but I'm fairly confident we correctly identified the issue. But I spent some more time thinking about it and I think the original point that the distinction between deps and tools is not sufficient for workers solely given their definitions. But it might be made sufficient with documentation on how workers treats them.
I'm going to say that the environment/input that the worker is executed in/on is split into two parts, the components that are to be reused/cached across requests (e.g. worker startup flag, interpreted code) and those that will be different between requests (e.g. filename to compile, file contents of file to compiles). These may take the form of flags or files in the filesystem. It is important for correctness that the worker to be restarted every time one of its reusable inputs changes/for it not to treat the inputs that bazel thinks are changeable as reusable. Thus the specification of what bazel thinks is which is very important for a tool author to know. And I cannot find that in the public documentation:
The internal implementation seems to use only the tools in computing the hash: https://github.com/bazelbuild/bazel/blob/f6e1074b09ebefba185c0531e9cea26b9596c8a9/src/main/java/com/google/devtools/build/lib/worker/WorkerFilesHash.java#L60
This seems like a very reasonable decision because tools
correspond to those built for the execution platform where as inputs
correspond to things built for the target platform. But the difference between those fields is the platform for which they are built, and I think that if persistent workers are going to use tools for as part of the worker key then documenting that is the right solution. That doesn't let a worker mark target platform dependencies as ones where it wants to be rebuilt, but I cannot find a good use case for that and it can be handled by the worker restarting itself internally when the hash changes on a new WorkRequest.
Description of the problem / feature request:
With a custom worker, each action can reuse an existing worker process or start a new one. It seems that the current logic doesn't allow specifying what inputs are needed by the worker before it is able to accept requests and what is needed only for handling individual requests. Thus while if code that is used by the worker at startup is changed it means that the old worker processes cannot be safely reused, bazel currently assumes that worker processes can be reused.
Inputs vs tools is not a sufficient distinction here as that is currently for host vs target configuration and a rule may not need the worker restarted if a tool is changed.
This leads to bugs that are able to be fixed by a bazel clean, (which seems to shutdown old workers).
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
I don't have one. You can try taking, https://github.com/endobson/minimal-racket/commit/3ad8f443f517b1ece4a1e7bbd1b8ee2b37b7919f and mucking with the tools to make it fail to compile and then revert the changes. You should still see issues after reverting the changes, but before running bazel clean.
What operating system are you running Bazel on?
Mac OS
What's the output of
bazel info release
?release 2.0.0
What's the output of
git remote get-url origin ; git rev-parse master ; git rev-parse HEAD
?This question really needs to be reworded as only applying if bazel is a git version, and that you want this run in the development version of the bazel directory.
Have you found anything relevant by searching the web?
Yes that the documentation for this feature (workers) is non-existent and has been so for years, and fixing this has been a P1 for over 6 months. https://github.com/bazelbuild/bazel/issues/7997