Open dieortin opened 1 year ago
The standard recommendation thus far has been to use a dict with None as the values. I've used that in practice and while it's not quite as nice, it's perfectly workable.
@dieortin @ndmitchell It's already half-done in Go: https://github.com/google/starlark-go/issues/492
@suganoi that's nice! I have also just seen https://github.com/bazelbuild/starlark/issues/20, but it doesn't seem to include the existence of set
.
IMO the set
API selected for the Go version makes sense and should be added to the general spec. @laurentlb what do you think?
For the record, the example above can be written:
names = ["Thomas", "David", "Michelle", "David"] # contains duplicate names
unique_names = {n: None for n in names}.keys()
In many cases, people want sets just to remove duplicates; this workaround may be sufficient for that use-case.
In Bazel BUILD files, there's a tradition of using lists in most places. For example, deps
and srcs
are lists, even when the order is not relevant and duplicates are forbidden. In Bazel rules, sets should usually be avoided, and depsets should be used most of the time. So there were concerns that adding a native type set
could cause more problems to Bazel. When a set is really needed in Bazel, a set implementation is provided in skylib (https://github.com/bazelbuild/bazel-skylib/blob/main/lib/new_sets.bzl).
As mentioned previously, Starlark-Go has a implementation for sets (https://github.com/google/starlark-go/blob/master/doc/spec.md#set - https://github.com/google/starlark-go/blob/90ade8b19d09805d1b91a9687198869add6dfaa1/starlark/library.go#L990). It can be used using a flag.
I'm not opposed to a set data type, although if we do add it, I think we ought to also support the literal {foo, bar, ...}
syntax. I regularly field questions from new macro and rule authors asking for where is Starlark's set data type :) In Bazel, it would entail a bit of work to chase down the native methods that currently require a starlark list as argument but which could also take a set as argument where it makes sense. And of course we'd need clear documentation for when to use sets vs. depsets.
I think we ought to also support the literal
{foo, bar, ...}
syntax
This is not as easy decision: people constantly make mistakes by writing {}
assuming it is an empty set (and in many cases it behaves like empty set, e.g. in in
operator or iterator, making it harder to debug). And writing set([foo, bar, ...])
is OK-ish, and has no perf difference if optimized properly.
I suggest to have separate discussions/decisions for set()
builtin, and {...}
syntax for sets.
Given the discussions ... is there any status on the set data type feature?
@tomaspipes - given the positive reaction from the community here and elsewhere, I want to take a few days this summer to add a set
data type to the Java implementation, and have it enabled in Bazel 8. The Go implementation already has set
behind a flag, we may want to flip the flag to enabled.
A
set
data structure, similar to the one in Python, would be a useful addition. For example, it is sometimes required to ensure no duplicate values exist in alist
, and aset
is the most ergonomic solution.Other common use cases are:
Thoughts? Should I make an attempt at writing a proposal?
Use case example
As an example, I have run into this need while adding automatic dependency exploration in my fork of kklochkov/rules_qt, a ruleset for building Qt applications with Bazel. When generating a rule instantiation, the
deps
attribute cannot contain duplicate values. Because my feature analyzes Qt shared libraries in the system, and they can be duplicated with different names (for example,libQt6Quick.so
andlibQt6Quick.so.6.4.3
) This introduces the need to enforce each dependency is only added once. Of course, there are ways to work around this problem, but it would be easily solved with aset
.