Open tetromino opened 2 years ago
Interested parties: @kkress @brandjon @adonovan
(Alternative proposals that satisfy the use cases are welcome - @kkress suggested an ispredefined(name)
builtin - but a zero-argument dir()
seems the most "starlarkic" way to go.)
In our setup we have native
global, which exposes different attrs depending on the interpreter. When conditional logic is needed, it can be made dependent on the fields and field values.
I guess you meant vars()
function, not dir()
. Because dir()
only provide names, and not many things are possible with dirs()
. E. g. you can conditionally check if some global exists, but you cannot obtain this value because of static analysis in Starlark: referencing non-existing global is a static error.
The issue with no-argument vars()
is that it might make certain optimizations much harder. For example, consider this code:
def f():
x = 1
return x
optimizer can replace it with:
def f():
return 1
However, if code is:
def f():
x = 1
something(vars())
return x
the optimization cannot be done, because analysing this middle statement is virtually impossible. Basically, many optimizations need to disabled if there's no-argument vars()
call.
It is even worse than that. Consider this example:
def f(g):
x = 1
something(g())
return x
If there's no-argument vars()
call, we cannot inline this x = 1
statement because we don't know that g
is not vars
.
I was thinking, Starlark may need to have starlark
global, but default it may have fields like vendor
and version
, but other embedders may add different attrs to it.
I feel like we discussed this in the past, but I can't find the thread. I initially thought this seemed reasonable but @stepancheg reminded me of the reason this is problematic: static scope resolution.
Consider:
if 'foo' in dir():
print(foo)
If foo is predeclared, then the print statement resolves to the predeclared binding, the condition is true, and the value of foo is printed. But if foo is not predeclared, then print(foo) gives an 'undefined' error.
Stepan suggests vars
as the solution. I agree vars
is problematic for the reasons he gives, but that's because vars includes the complete environment at the point of the call. If instead we defined an operator predeclared()
(or __builtins__
) that returned an immutable struct of only the predeclared values (excluding itself!), it would allow the above code to be rewritten:
if hasattr(__builtins__, 'foo'):
print(__builtins__.foo)
It would be best to specify that the predeclared()
function result or __builtins__
constant is a frozen dict to avoid repeated allocation, and the question of what happens if you mutate __builtins__
.
In our setup we have native global, which exposes different attrs depending on the interpreter.
'native' isn't part of the Starlark spec; it's a Bazelism. You're right that it has different fields in different environments.
[Edited: changed to struct, not dict]
In general, the set of builtins should be fairly stable. I don't recommend trying to make the same code compatible with multiple languages, or multiple versions of a language. Things are harder to test, analyze, and reason about. Starlark was designed to be much more static than Python. This kind of feature request is like trying to use Starlark in a dynamic way - and it's going to conflict, as mentioned in the previous comments.
A Bazel project can specify which version is expected, e.g. in the ̀.bazelversion` file. Use the new features when you've updated your version of Bazel.
When new features are added as fields on an object (e.g. native.
or ctx.
in Bazel), you typically get more flexibility.
(a previous related discussion for Bazel was here: https://github.com/bazelbuild/bazel/issues/7428)
A feature like this would have come in handy for me a few months ago. I was adding a new global symbol to Google's internal build, and it was effectively impossible to write Starlark code in a forward and backwards compatible manner. I had to wait about a month for a new release to land before I could go any further.
Really, I would view this feature as a way for Starklark code not bundled with Bazel to be able to be compatible with multiple versions of Bazel. As it is today, if Bazel added a new global symbol, then a Starlark library simply can't use it until they're ready to entirely drop support and break all prior Bazel versions. This is rather overkill, especially for new features that are additive, and wouldn't otherwise invalidate supporting prior Bazel version.
Other cases, like native, or rules exporting a public struct interface, have a natural solution to just check hasattr(native, "Foo")
. This works pretty well and it's intuitive.
So, I would suggest having a builtins
object that is a frozen struct (not a dict). This effectively allows the same functionality for builtins that are available in other cases. It works around the issue of a undefined symbol, and using the builtins
name is a signal that there's some Bazel/Starlark runtime/version compatibility going on.
A Bazel project can specify which version is expected, e.g. in the ̀.bazelversion` file. Use the new features when you've updated your version of Bazel.
This doesn't really scale.
As a project owner, if any Starlark in my transitive closure even mentions the name, even if I don't use the functionality, then I'm broken and I have to upgrade Bazel. And then my entire transitive closure needs to support the new Bazel version. Stated another way: I don't actually control when Bazel is updated anymore, the library with the shortest Bazel support timeline does.
As a library owner, I'm forced to break somebody. Either I force the transitive closure of all my users to stay with the oldest Bazel version any of them use, or I force them all to a minimum Bazel version.
This isn't entirely hypothetical, either. This sort of problem has already occurred with skylib, with the feature to add the "name" argument to unittest.make(). That case is somewhat different and has a different solution, but it's still in the same vein: Starlark needing to be cross-Bazel version compatible. Given that there's plenty of room to improve testing Starlark, a new global being added to facilitate that seems very plausible.
Just echoing the concern that this is a very real problem that affects the health of the ecosystem. This topic spawned a whole design doc: https://docs.google.com/document/d/1HJf3gMYIrzmTRqbD4nWXH2eJRHXjLrOU0mmIeZplUzY/edit
In the end we implemented a "bazel_features" project (https://github.com/bazel-contrib/bazel_features) to work around this issue for Bazel itself. It includes a "globals" backdoor where we provide access to global symbols based on Bazel version detection. Worth noting that it's not a replacement for this issue, merely a workaround.
I propose adding a zero-argument form of
dir()
, matching that of Python 3: https://docs.python.org/3/library/functions.html#dirThere are two use cases, which are currently not very easy to work around otherwise: