bazelbuild / starlark

Starlark Language
Apache License 2.0
2.48k stars 163 forks source link

spec: scope of globals should not include preceding text #138

Open alandonovan opened 3 years ago

alandonovan commented 3 years ago

The Starlark spec states that the scope of a global binding is the entire file, including the portion before the binding. That is, this program is not statically rejected, but fails during evaluation:

print(x) # error: global 'x' not yet assigned
x = 1

And in this program, the built-in zip function cannot be used at the top of the file because of a later binding of the name to a boolean:

zip() # error: global 'zip' is not yet assigned

zip = True # enable compression
if zip:
   ...

A more realistic example from a Bazel BUILD file:

package(default_visibility = ...) # error: in function call, got string, want callable
...
package = "my/project"
print(package + "/" name)

(This example only appeared as a problem this week because Bazel had a bug in which it incompletely enforced the scope rules: it did so in .bzl files, but not BUILD files. That bug is now fixed.)

The motivation for the spec's rule is that every use of a top-level name throughout a BUILD file should have the same meaning. But, fundamentally, BUILD files, like all Starlark files, are still imperative programs. Consider:

x=["hello"]
print(x[0]) # "hello"
x[0] = "goodbye"
print(x[0]) # "goodbye"

This rule hinders language evolution. Consider this existing Starlark file, and let's pretend it is loaded by many others:

# foo.bzl
def f(): ...some useful function...

Now imagine that the Starlark maintainers wish to add a new "universal" built-in function, or the maintainers of an application wish to add a new built-in function, also called f. There is literally no way that the file foo.bzl can use the new built-in function and continue to define its existing function called f.

However, ~using Python's scoping rules~ if the scope were to start at the first binding reference, one could write:

# foo.bzl
_builtin_f = f
def f(): ... some useful function...
use(_builtin_f)

and clients could continue to import the user-defined f function, renaming if necessary, in a load statement.

I think this rule is counterproductive, and that we should abolish it ~and follow Python~. In other words, the scope of a global should extend from its binding to the end of a file.

EDIT: what I am proposing is not the same as Python.

ndmitchell commented 3 years ago

I wouldn't describe Python as having that scoping rule, more that module level assignments are totally dynamic. If you do:

if random.choice([True, False]):
    x = 100
print(x)

Half the time that prints 100, half the time it says x is not defined. There is no scoping, only the resultant semantics of poking things into a map dynamically.

The place where Python does have static scopes is inside definitions, and there they operate as:

x = 1
def foo():
    print(x) # fails
    x = 2 

Given that Starlark requires the static scope at the module level, it seems strange to use three different types of scoping:

Also, if a module extends from its definition onwards, what does this mean:

x = zip
if False:
  zip = 1
y = zip

In Python it means y == zip. Would the Starlark scope be similarly based on dynamic evaluation? Or would the first possible assignment cause the variable to be defined?

I can see the desire to access globals that are otherwise shadowed, but some kind of meta globals object might be an easier way to go - e.g. globals.f is the unshadowable version of f.

alandonovan commented 3 years ago

You're absolutely right; thanks for the correction. Let us not follow Python then. ;-)

As for this litmus test:

x = zip
if False:
  zip = 1
y = zip

I am proposing that the first zip resolves to the built-in, and the second and third resolve to the global: lexical order determines scope. Execution would fail in y = zip because "global zip has not been assigned". Scope is still static (this is important both for comprehensibility and for optimization).

Consider these two examples of the proposed behavior:

x = zip
if cond:
   print(zip) # success, prints built-in zip function
else:
   zip = 1

Flipping the order of then/else cases changes the scoping:

x = zip
if not cond:
   zip = 1
else:
   print(zip) # dynamic error: global 'zip' referenced before assignment

Yes, Python has three different scope regimes (module, function, comprehension), and the current Starlark spec eliminates one of them by making function-locals and globals behave the same, but I don't think that's a strong enough reason to choose the current semantics.