Open fmeum opened 3 years ago
There's a long-standing request to make select()
properly inspectable in Starlark. And earlier this year, around August, some folks tried to write solutions, which haven't yet landed in production code.
If that more general approach works I think that's the right way to support this. My position is to try to support those commits getting into the main branch.
There's a long-standing request to make
select()
properly inspectable in Starlark. And earlier this year, around August, some folks tried to write solutions, which haven't yet landed in production code.If that more general approach works I think that's the right way to support this. My position is to try to support those commits getting into the main branch.
Fully agree that this is a superior approach. Is there any way I can support these efforts?
https://github.com/bazelbuild/bazel/issues/8419#issuecomment-889388608 has the last Github context, I believe. https://bazel-review.googlesource.com/c/bazel/+/174150 has some supporting code.
There was some private discussion between me, @c-parsons , and https://github.com/wcn3 particularly that I could paste here.
This looks promising, thanks. I would be interested in the results of your conversation in case you can share.
Regarding the PR: Adding .items()
also to SelectorValue
would be needed to support pure selects (without the workaround of appending a fake element, then calling .items()
, then dropping the element). But perhaps a refactoring that drops SelectorValue
entirely is better.
Can you show an example where just a SelectorList
doesn't work? I'm asking for clarification because internally Bazel converts SelectorValue
s to SelectorList
s at a certain point in evaluation, so I'm not sure where SelectorValue
references are still hanging around.
And can you clarify how this fits into the original .append
pattern? I just want to triply check that this .items()
approach is really generic enough.
Re: my discussion, it essentially came down to me supporting pushing through the .items()
PR. With a few caveats:
Should we support different types in the .items()
list to account for //conditions:default
?
x = ["a"] + select({"b": "c"}) + ["b"] y = ["a", {"b": "c"}, "b"] # This list has two types: dicts and strings print(x.items() == y) # true
Or keep it all consistent?
x = ["a"] + select({"b": "c"}) + ["b"] y = [{"//conditions:default":"a"}, {"b": "c"}, {"//conditions:default":"b"}] # one type: dict print(x.items() == y) # true
Would we want Starlark-defined helper routines to support https://github.com/bazelbuild/bazel/issues/8419#issuecomment-889388608? The idea there is we could define a Starlark function iselect
that returns not an actual select
, but it list of entries as shown in the example under 'You might create':
static_deps = ["//foo:bar"] + [{
"//axisone:condition": ["//baz:qux"],
"//conditions:default": [],
}] + [{
"//axistwo:conditiontwo": ["//my:othertarget"],
"//conditions:default": [],
}]
Then late-construct the actual select()
when macros are finally done inspecting its contents. This can be done with further helper functions.
This part probably sounds vague. The basic idea, I believe, is that select.items()
is a very low-level function that decomposes a select
into its constituent dictionary. These Starlark helper routines could provide a more robust interface for constructing those dictionaries, re-composing them back into select()
, and querying them in useful ways that might be more convenient than all Starlark code directly manipulating select.items()
from scratch.
That said, select.items()
by itself feels like a strict improvement independently of helper function concerns. The only immediate concern I can think of is whether its result is single-typed or not as described above.
The reason I'd like to support anyone willing to implement this is this has been a known pain for a lot of people for a long time. The precedent for improving this is clear, and crosses our general caution threshold for complicating the core Starlark API.
But we do want to remain convinced the solution is generic enough to serve most practical use cases.
I would say that there are two separate goals we could strive to realize:
select()
as possible (of course without silently altering their behavior).select()
.The PR at https://bazel-review.googlesource.com/c/bazel/+/174150, in one form or another, would realize 2. But as always, a new .items()
function in Bazel 6, hidden behind an experimental flag, will mean that a long time will pass until rulesets can adopt this functionality. The existing body of macros will not benefit from this until they are rewritten in terms of the new API.
So yes, I believe the solution proposed in your comments to be generic enough, but it wouldn't yield any short-term improvements for users of existing rulesets. Unfortunately, I haven't found a good way yet to quantify how many macros would benefit from .append()
on select()
, so I can't say how much of a positive change this could be. I just ran into its lack twice with rulesets I maintain myself.
Edit: I tried it out and found .items()
to work on a single select. False alarm, I simply wasn't aware of the conversion.
I don't personally see the need to make .items()
experimental.
Would you still like to help push this forward? I think the best approach is simply putting out some more visibility to the community before submitting that PR or something like it. Some combination of a straightforward proposal with a few good before/after examples, a post to bazel-discuss@, and some kind of example code showing how to adapt common patterns the new way.
The other pattern I keep thinking about is a library macro that takes some parameter (deps
) that may or may not be a select()
. What's the most elegant way to say "if this is a select()
, then call .items()
on it, else just get its direct value"? I think this is pretty trivial with a support macro that converts non-select
values into single-entry dicts. But the pattern remains in my head since a lot of breakages we've seen has been when a macro expects a parameter to be a select()
or not and gets the opposite.
I can help with all of this but I don't want to lead.
I will see whether I can write a short proposal that includes a prototype of such a helper macro. I'll ask when I get stuck, thanks for the offer to help
I decided to wait until https://github.com/bazelbuild/bazel/issues/14259 has been resolved as it might have some implications on the proposal.
I have been thinking more about .items()
. With #14259 resolved, the new reality is that select()
keys can be instances of Label
, which makes them a bit harder to work with generically. So before we commit to a potentially hard-to-use design, I would like to get more clarity on the actual use cases that matter in reality.
During my work on rulesets, I have encountered the following situations in which I had trouble interacting with a select()
expression:
+
and would benefit from an .append()
function, but .items()
wouldn't help.java_binary
targets with their corresponding _deploy.jar
targets). This use case would be more cleanly serviced by a map
functionality over the values in a select expression. In particular, it does not require access to the select
keys.In general, I'm somewhat worried about composability if macros get access to not only the select
values, but also the keys. I like that Starlark rules cannot enumerate providers on target and would, for a similar reason, prefer not to give macros access to information they don't, and possibly shouldn't, need.
@gregestren Do you have further example use cases for the hypothetical .items()
function on select
s. For example, are there potential use cases in google3 that motivated the concept and that you could talk about? If we don't find more examples, we should probably start a thread on bazel-discuss
.
I think uses are more nuanced than this, although I can't highlight specific cases. For example: filtering certain dependencies based on some unknown criteria. These criteria may or may not be based on the keys. But I worry if we commit to something like a map
then we're committing ourself to yet some other API if/when that proves not to be powerful enough.
There also seems to be precedent for macros to want to build up the dictionaries that they ultimately turn into a select
just before passing them to proper rules. This could even be a sequence of macros cooperating with each other to slowly build a dictionary. This doesn't automatically imply needs beyond 1. and 2. but I could see that logic becoming arbitrary specialized.
I think a bazel-discuss
thread is worthwhile however we navigate all this.
I came back to this issue since I need the ability to inspect select
s in a macro.
select
has seen quite a number of changes since the last activity on this issue. It now accepts Label
instances as keys and supports concatenation via the |
dict union operator.
As a result, the previously thought out .items()
API would no longer be unambiguous: It isn't necessarily possible to distinguish a dict
value from a dict representing a select
. We would probably need to resolve this by having .items()
return a list of pairs, with e.g. (false, <value>)
representing an unconditional value and (true, <select>)
representing a select
dict. Furthermore, since select
s are no longer just concatenated with +
, we may need to return a lambda or enum representing the operation to be used for concatenation.
All of this makes me wary of exposing the internal representation of select
in this way. Over time, the complexity of select
expressions may converge towards that of arbitrary Starlark expressions and it would be difficult to come up with a return type of .items()
that accounts for future changes in this area.
Instead, with growing support for lambdas in Starlark, an alternative and perhaps more maintainable approach could be a .map_items()
function that accepts a function with parameters (key and value). For a select dict, it would be called for every item in that dictionary. For a regular value, it would be called with key = None
and that value. While this doesn't allow for changes to the structure of the select
(e.g., you can't directly remove cases from a select dict, only change their value to be empty), it is free of implementation details of select
expressions and naturally expands with the scope of select
.
@gregestren What are your thoughts on this? I would like to get some early feedback before I start a mailing list thread.
Let me process this and also follow up with @brandjon ...
What about:
/**
* A Starlark method for mapping a function over selects (or over any
* configurable attribute).
*
* <p>The mapping function is applied on every value that appears directly, in a list or in a
* select. Returned is an object of the same shape as the input.
*
* <p>Use this function to transform an attribute in a macro. This ensure the same code works also
* when the selects are present.
*/
@StarlarkMethod(
name = "map_values",
documented = false,
parameters = {
@Param(
name = "function",
allowedTypes = {@ParamType(type = StarlarkCallable.class)}),
@Param(
name = "attribute",
allowedTypes = {
@ParamType(type = SelectorList.class),
@ParamType(type = Iterable.class)
}),
},
I'd be very supportive of adding something like this to Bazel's APIs.
cc @hvadehra @brandjon
Examples:
def map(val):
if val == "//xx":
return "//yy"
return val
map_values(map, ["//xx","//zz"]) == ["//xx", "//yy"]
map_values(map, select({"//c1": "//xx", "//conditions:default": "//zz"}))
== select({"//c1": "//yy", "//conditions:default": "//zz"})
map_values(not, select({"//c1": True, "//conditions:default": False}))
== select({"//c1": False, "//conditions:default": True})
Can it be extended to work with dict
as well? That's where I've seen map_values
used before, and would make sense in addition to list
support.
I would prefer this function to preserve the structure of the values of the select dict rather than magically traversing them. Select values can be lists or even dicts themselves, which would make this mode's precise operation hard to predict.
In other words, map_values
should just be function(attribute)
if type(attribute) != "select"
and otherwise traverse the elements
of the SelectorList
. We might call it map_attr
instead.
I implemented this function in Starlark (which is a tour de force) and found it to be quite useful.
Recapping, as I understand:
items()
approach exposes the structure of what's inside a select()
items()
may have to return an ever-more-complicated data structure to properly communicate the original structureappend()
partially support this. I thought items()
would too but @fmeum you suggested earlier it wouldn't?)+ select(...)
?But it's less powerful. In particular, adding or removing elements is awkward / impossible. As is basically any other restructuring of the input. (append() partially support this. I thought items() would to but @fmeum you suggested earlier it wouldn't?)
@gregestren Yes, my main concern is that I can't think of an API for items()
that would allow it to distinguish between a dict arising as an unconditional attribute value and one that is the representation of a select
. More concretely, what would you want items()
to return for these three attribute value:
{"//:foo": "bar"}
select({"//:foo": "bar"})
select({"//:condition": {"//:foo": "bar"}})
I imagine upcoming symbolic macros is going to up-prioritize this.
We seem to have a reasonable sense of the basic approaches and their limits. I'd think we want to be as clear as we can on what use cases are actually important (I recognize @fmeum and others made that point upthread), then build the solution around them.
Can we poll users in some thread and ping all the related issues on this subject and collect feedback?
Many Starlark macros use
.append()
on attributes likedeps
to inject additional dependencies. Buildifier also emits a lint warning when a single element is added to a list via+=
rather than.append()
.This can be problematic for users of the macros as it silently makes the attribute non-configurable:
select()
only supports concatenation, not.append()
.Changing that by implenting
.append()
onSelectorValue
andSelectorList
would thus make many existing macros more useful as well as resolve the tension between the linter and configurability. If this is deemed useful, I would be willing to send a PR.