Open DavidANeil opened 2 years ago
What's the likeliness of this being implemented in the nearish future?
The Go compiler would like to be able to rely on it to address build scalability issues.
I spent a few hours poking around trying to get it work, but my unfamiliarity with Java and the Bazel architecture made my experiment unsuccessful. I'd still love to see this feature. I think it wouldn't be all-too-difficult to implement, and could vastly improve build times and cache hit rates in some common cases.
Actually, thinking about this some more, I think it makes sense to have an orthogonal feature for early pruning of the inputs list, if at all.
Suppose dependencies X->Y->Z, where a priori the build system can't tell whether X depends on outputs from Z.
If when compiling Y we're able to determine that Z will never be needed by X, that's useful information so that X's cache key doesn't need to include Z. But also that Z doesn't even need to be available to X's compile action (e.g., to reduce network traffic in the case of remote execution).
But separately there's the possibility that users of Y may need Z, yet X is just a particular target that doesn't. It's still useful in this case to know that changes to Z are irrelevant to X, to improve incremental rebuild times.
So they're complementary, not competing, features.
cc @lberki, could use some guidance if this ideas are viable / should be triaged to P3
My line in the sand is that if we implement this feature, it should be possible to implement C++ include scanning behind it because I don't want to support two independent mechanisms for input discovery indefinitely.
This has a number of implications:
unused_inputs_list
" to a separate action, because it would mean that every C++ compilation action would come with a separate input discovery action, which would very probably be an unacceptable amount of memory overhead (I'd be happy to be proven wrong on this one, though)InputMetadataProvider
yet; it's a wart, yes, but it's proven pretty though to fix without incurring a performance hit.
Status Quo
unused_inputs_list
allows inputs to be trimmed after an action executes so that the same action will not be re-executed if only the listed unused inputs are changed. In some cases it is possible to determine this list of unused inputs prior to running the action, this is known as "input discovery" or "input pruning". While many rulesets could take advantage of this, it is not exposed to Starlark rules. The builtin C++ rules do take advantage of this, and there is discussion about allowing a C++ specific version of input pruning for the Starlark version of the rules (see #13871).Description of the Feature Request
As described by @bjacklyn in https://github.com/bazelbuild/bazel/pull/13871#issuecomment-948051404, and discussed in the Q&A of the BazelCon21 stream: it seems that it should be possible for
unused_inputs_list
to be extended in a meaningful way to allow all Starlark rules access to input discovery. If theFile
referenced in aunused_inputs_list
attribute is also listed in theoutputs
of that action, then the current behavior is maintained: theinputs
are trimmed after the action executes. If theunused_inputs_list
File
is listed underinputs
, then theinputs
are trimmed before the action is scheduled for execution, including not being part of the lookup key for the action cache. If the action does execute, then the listed unused inputs will not be included in the sandbox. If theunused_inputs_list
is not listed underinputs
noroutputs
, then the build should fail.Example Usage
In this example, both actions use
unused_inputs_list
. The action that produces it uses it to trim its inputs after execution. TheCompile
action uses it to trim its inputs before execution, as a form of input discovery.