spec: scope of globals should not include preceding text

alandonovan commented 3 years ago

The Starlark spec states that the scope of a global binding is the entire file, including the portion before the binding. That is, this program is not statically rejected, but fails during evaluation:

print(x) # error: global 'x' not yet assigned
x = 1

And in this program, the built-in zip function cannot be used at the top of the file because of a later binding of the name to a boolean:

zip() # error: global 'zip' is not yet assigned

zip = True # enable compression
if zip:
   ...

A more realistic example from a Bazel BUILD file:

package(default_visibility = ...) # error: in function call, got string, want callable
...
package = "my/project"
print(package + "/" name)

(This example only appeared as a problem this week because Bazel had a bug in which it incompletely enforced the scope rules: it did so in .bzl files, but not BUILD files. That bug is now fixed.)

The motivation for the spec's rule is that every use of a top-level name throughout a BUILD file should have the same meaning. But, fundamentally, BUILD files, like all Starlark files, are still imperative programs. Consider:

x=["hello"]
print(x[0]) # "hello"
x[0] = "goodbye"
print(x[0]) # "goodbye"

This rule hinders language evolution. Consider this existing Starlark file, and let's pretend it is loaded by many others:

# foo.bzl
def f(): ...some useful function...

Now imagine that the Starlark maintainers wish to add a new "universal" built-in function, or the maintainers of an application wish to add a new built-in function, also called f. There is literally no way that the file foo.bzl can use the new built-in function and continue to define its existing function called f.

However, ~using Python's scoping rules~ if the scope were to start at the first binding reference, one could write:

# foo.bzl
_builtin_f = f
def f(): ... some useful function...
use(_builtin_f)

and clients could continue to import the user-defined f function, renaming if necessary, in a load statement.

I think this rule is counterproductive, and that we should abolish it ~and follow Python~. In other words, the scope of a global should extend from its binding to the end of a file.

EDIT: what I am proposing is not the same as Python.

ndmitchell commented 3 years ago

I wouldn't describe Python as having that scoping rule, more that module level assignments are totally dynamic. If you do:

if random.choice([True, False]):
    x = 100
print(x)

Half the time that prints 100, half the time it says x is not defined. There is no scoping, only the resultant semantics of poking things into a map dynamically.

The place where Python does have static scopes is inside definitions, and there they operate as:

x = 1
def foo():
    print(x) # fails
    x = 2

Given that Starlark requires the static scope at the module level, it seems strange to use three different types of scoping:

The module level in order scoping.
The inside definition of everything coming into scope at once.
The comprehension level of nothing in scope until some point when everything comes into scope.

Also, if a module extends from its definition onwards, what does this mean:

x = zip
if False:
  zip = 1
y = zip

In Python it means y == zip. Would the Starlark scope be similarly based on dynamic evaluation? Or would the first possible assignment cause the variable to be defined?

I can see the desire to access globals that are otherwise shadowed, but some kind of meta globals object might be an easier way to go - e.g. globals.f is the unshadowable version of f.

alandonovan commented 3 years ago

You're absolutely right; thanks for the correction. Let us not follow Python then. ;-)

As for this litmus test:

x = zip
if False:
  zip = 1
y = zip

I am proposing that the first zip resolves to the built-in, and the second and third resolve to the global: lexical order determines scope. Execution would fail in y = zip because "global zip has not been assigned". Scope is still static (this is important both for comprehensibility and for optimization).

Consider these two examples of the proposed behavior:

x = zip
if cond:
   print(zip) # success, prints built-in zip function
else:
   zip = 1

Flipping the order of then/else cases changes the scoping:

x = zip
if not cond:
   zip = 1
else:
   print(zip) # dynamic error: global 'zip' referenced before assignment

Yes, Python has three different scope regimes (module, function, comprehension), and the current Starlark spec eliminates one of them by making function-locals and globals behave the same, but I don't think that's a strong enough reason to choose the current semantics.

bazelbuild / starlark

spec: scope of globals should not include preceding text #138