bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.21k stars 4.07k forks source link

Implement `.append()` on `select()` #14157

Open fmeum opened 3 years ago

fmeum commented 3 years ago

Many Starlark macros use .append() on attributes like deps to inject additional dependencies. Buildifier also emits a lint warning when a single element is added to a list via += rather than .append().

This can be problematic for users of the macros as it silently makes the attribute non-configurable: select() only supports concatenation, not .append().

Changing that by implenting .append() on SelectorValue and SelectorList would thus make many existing macros more useful as well as resolve the tension between the linter and configurability. If this is deemed useful, I would be willing to send a PR.

gregestren commented 2 years ago

There's a long-standing request to make select() properly inspectable in Starlark. And earlier this year, around August, some folks tried to write solutions, which haven't yet landed in production code.

If that more general approach works I think that's the right way to support this. My position is to try to support those commits getting into the main branch.

fmeum commented 2 years ago

There's a long-standing request to make select() properly inspectable in Starlark. And earlier this year, around August, some folks tried to write solutions, which haven't yet landed in production code.

If that more general approach works I think that's the right way to support this. My position is to try to support those commits getting into the main branch.

Fully agree that this is a superior approach. Is there any way I can support these efforts?

gregestren commented 2 years ago

https://github.com/bazelbuild/bazel/issues/8419#issuecomment-889388608 has the last Github context, I believe. https://bazel-review.googlesource.com/c/bazel/+/174150 has some supporting code.

There was some private discussion between me, @c-parsons , and https://github.com/wcn3 particularly that I could paste here.

fmeum commented 2 years ago

This looks promising, thanks. I would be interested in the results of your conversation in case you can share.

Regarding the PR: Adding .items() also to SelectorValue would be needed to support pure selects (without the workaround of appending a fake element, then calling .items(), then dropping the element). But perhaps a refactoring that drops SelectorValue entirely is better.

gregestren commented 2 years ago

Can you show an example where just a SelectorList doesn't work? I'm asking for clarification because internally Bazel converts SelectorValues to SelectorLists at a certain point in evaluation, so I'm not sure where SelectorValue references are still hanging around.

And can you clarify how this fits into the original .append pattern? I just want to triply check that this .items() approach is really generic enough.

Re: my discussion, it essentially came down to me supporting pushing through the .items() PR. With a few caveats:

That said, select.items() by itself feels like a strict improvement independently of helper function concerns. The only immediate concern I can think of is whether its result is single-typed or not as described above.

gregestren commented 2 years ago

The reason I'd like to support anyone willing to implement this is this has been a known pain for a lot of people for a long time. The precedent for improving this is clear, and crosses our general caution threshold for complicating the core Starlark API.

But we do want to remain convinced the solution is generic enough to serve most practical use cases.

fmeum commented 2 years ago

I would say that there are two separate goals we could strive to realize:

  1. Make existing macros work as well with select() as possible (of course without silently altering their behavior).
  2. Make future macros more powerful and less contrived by offering better APIs around select().

The PR at https://bazel-review.googlesource.com/c/bazel/+/174150, in one form or another, would realize 2. But as always, a new .items() function in Bazel 6, hidden behind an experimental flag, will mean that a long time will pass until rulesets can adopt this functionality. The existing body of macros will not benefit from this until they are rewritten in terms of the new API.

So yes, I believe the solution proposed in your comments to be generic enough, but it wouldn't yield any short-term improvements for users of existing rulesets. Unfortunately, I haven't found a good way yet to quantify how many macros would benefit from .append() on select(), so I can't say how much of a positive change this could be. I just ran into its lack twice with rulesets I maintain myself.

Edit: I tried it out and found .items() to work on a single select. False alarm, I simply wasn't aware of the conversion.

gregestren commented 2 years ago

I don't personally see the need to make .items() experimental.

Would you still like to help push this forward? I think the best approach is simply putting out some more visibility to the community before submitting that PR or something like it. Some combination of a straightforward proposal with a few good before/after examples, a post to bazel-discuss@, and some kind of example code showing how to adapt common patterns the new way.

The other pattern I keep thinking about is a library macro that takes some parameter (deps) that may or may not be a select(). What's the most elegant way to say "if this is a select(), then call .items() on it, else just get its direct value"? I think this is pretty trivial with a support macro that converts non-select values into single-entry dicts. But the pattern remains in my head since a lot of breakages we've seen has been when a macro expects a parameter to be a select() or not and gets the opposite.

I can help with all of this but I don't want to lead.

fmeum commented 2 years ago

I will see whether I can write a short proposal that includes a prototype of such a helper macro. I'll ask when I get stuck, thanks for the offer to help

fmeum commented 2 years ago

I decided to wait until https://github.com/bazelbuild/bazel/issues/14259 has been resolved as it might have some implications on the proposal.

fmeum commented 2 years ago

I have been thinking more about .items(). With #14259 resolved, the new reality is that select() keys can be instances of Label, which makes them a bit harder to work with generically. So before we commit to a potentially hard-to-use design, I would like to get more clarity on the actual use cases that matter in reality.

During my work on rulesets, I have encountered the following situations in which I had trouble interacting with a select() expression:

  1. I had to append one or more items to a configurable list. This is possible today with + and would benefit from an .append() function, but .items() wouldn't help.
  2. I wanted to do a mechanical modification of a list of labels (e.g., replace a bunch of labels pointing to java_binary targets with their corresponding _deploy.jar targets). This use case would be more cleanly serviced by a map functionality over the values in a select expression. In particular, it does not require access to the select keys.

In general, I'm somewhat worried about composability if macros get access to not only the select values, but also the keys. I like that Starlark rules cannot enumerate providers on target and would, for a similar reason, prefer not to give macros access to information they don't, and possibly shouldn't, need.

@gregestren Do you have further example use cases for the hypothetical .items() function on selects. For example, are there potential use cases in google3 that motivated the concept and that you could talk about? If we don't find more examples, we should probably start a thread on bazel-discuss.

gregestren commented 2 years ago

I think uses are more nuanced than this, although I can't highlight specific cases. For example: filtering certain dependencies based on some unknown criteria. These criteria may or may not be based on the keys. But I worry if we commit to something like a map then we're committing ourself to yet some other API if/when that proves not to be powerful enough.

There also seems to be precedent for macros to want to build up the dictionaries that they ultimately turn into a select just before passing them to proper rules. This could even be a sequence of macros cooperating with each other to slowly build a dictionary. This doesn't automatically imply needs beyond 1. and 2. but I could see that logic becoming arbitrary specialized.

I think a bazel-discuss thread is worthwhile however we navigate all this.

fmeum commented 1 year ago

I came back to this issue since I need the ability to inspect selects in a macro.

select has seen quite a number of changes since the last activity on this issue. It now accepts Label instances as keys and supports concatenation via the | dict union operator.

As a result, the previously thought out .items() API would no longer be unambiguous: It isn't necessarily possible to distinguish a dict value from a dict representing a select. We would probably need to resolve this by having .items() return a list of pairs, with e.g. (false, <value>) representing an unconditional value and (true, <select>) representing a select dict. Furthermore, since selects are no longer just concatenated with +, we may need to return a lambda or enum representing the operation to be used for concatenation.

All of this makes me wary of exposing the internal representation of select in this way. Over time, the complexity of select expressions may converge towards that of arbitrary Starlark expressions and it would be difficult to come up with a return type of .items() that accounts for future changes in this area.

Instead, with growing support for lambdas in Starlark, an alternative and perhaps more maintainable approach could be a .map_items() function that accepts a function with parameters (key and value). For a select dict, it would be called for every item in that dictionary. For a regular value, it would be called with key = None and that value. While this doesn't allow for changes to the structure of the select (e.g., you can't directly remove cases from a select dict, only change their value to be empty), it is free of implementation details of select expressions and naturally expands with the scope of select.

@gregestren What are your thoughts on this? I would like to get some early feedback before I start a mailing list thread.

gregestren commented 1 year ago

Let me process this and also follow up with @brandjon ...

comius commented 1 year ago

What about:

/**
 * A Starlark method for mapping a function over selects (or over any
 * configurable attribute).
 *
 * <p>The mapping function is applied on every value that appears directly, in a list or in a
 * select. Returned is an object of the same shape as the input.
 *
 * <p>Use this function to transform an attribute in a macro. This ensure the same code works also
 * when the selects are present.
 */
  @StarlarkMethod(
      name = "map_values",
      documented = false,
      parameters = {
        @Param(
            name = "function",
            allowedTypes = {@ParamType(type = StarlarkCallable.class)}),
        @Param(
            name = "attribute",
            allowedTypes = {
              @ParamType(type = SelectorList.class),
              @ParamType(type = Iterable.class)
            }),
      },

I'd be very supportive of adding something like this to Bazel's APIs.

comius commented 1 year ago

cc @hvadehra @brandjon

comius commented 1 year ago

Examples:

def map(val):
  if val == "//xx":
   return "//yy"
  return val

map_values(map, ["//xx","//zz"]) == ["//xx", "//yy"]
map_values(map, select({"//c1": "//xx", "//conditions:default": "//zz"}))
    == select({"//c1": "//yy", "//conditions:default": "//zz"})

map_values(not, select({"//c1": True, "//conditions:default": False}))
    == select({"//c1": False, "//conditions:default": True})
brentleyjones commented 1 year ago

Can it be extended to work with dict as well? That's where I've seen map_values used before, and would make sense in addition to list support.

fmeum commented 1 year ago

I would prefer this function to preserve the structure of the values of the select dict rather than magically traversing them. Select values can be lists or even dicts themselves, which would make this mode's precise operation hard to predict.

In other words, map_values should just be function(attribute) if type(attribute) != "select" and otherwise traverse the elements of the SelectorList. We might call it map_attr instead.

I implemented this function in Starlark (which is a tour de force) and found it to be quite useful.

gregestren commented 11 months ago

Recapping, as I understand:

fmeum commented 11 months ago

But it's less powerful. In particular, adding or removing elements is awkward / impossible. As is basically any other restructuring of the input. (append() partially support this. I thought items() would to but @fmeum you suggested earlier it wouldn't?)

@gregestren Yes, my main concern is that I can't think of an API for items() that would allow it to distinguish between a dict arising as an unconditional attribute value and one that is the representation of a select. More concretely, what would you want items() to return for these three attribute value:

{"//:foo": "bar"}
select({"//:foo": "bar"})
select({"//:condition": {"//:foo": "bar"}})
gregestren commented 11 months ago

I imagine upcoming symbolic macros is going to up-prioritize this.

We seem to have a reasonable sense of the basic approaches and their limits. I'd think we want to be as clear as we can on what use cases are actually important (I recognize @fmeum and others made that point upthread), then build the solution around them.

Can we poll users in some thread and ping all the related issues on this subject and collect feedback?