bazelbuild / starlark

Starlark Language
Apache License 2.0
2.48k stars 163 forks source link

Clarify which symbols are exported #37

Open laurentlb opened 5 years ago

laurentlb commented 5 years ago

A change in Bazel (https://github.com/bazelbuild/bazel/issues/5636) affects which symbols from a Starlark module are exposed and can be loaded. More specifically, with this change, symbols that are loaded in a module are not exported (another module cannot load it).

@alandonovan asked that we clarify what is considered as "exported".

The behavior we're going to use in Bazel is: a global value is exported if:

Example:

load("module", "a")

b, c, _d = 2, 3, 4

def e(): pass
def _f(): Pass

Symbols b, c, and e are exported. They can be loaded from another module. Symbols a, _d, and _f are not exported.

alandonovan commented 5 years ago

I agree that the current load statement is problematic and that limiting the re-export of loaded names is a good idea. But I don't think the submitted Bazel change is the right implementation nor is the proposed text the right specification.

For starters, it adds a new concept, the exportedness of a name. Previously "exported" was simply a shorthand for the expression name[0] != "_", not an independent concept, but now it requires an extra piece of state tacked on to a name. We already have all the concepts we need to explain the desired behavior: lexical blocks. All that is required is a new lexical block for file-local names.

A load statement would create a binding in the file's lexical block, which is distinct from the block containing the module globals. A def or assignment statement at top-level would continue to create bindings in the module block. It would be an error to declare the same name in both the file and module block. Name lookup walks up the lexical environment---the tree of blocks---as usual, passing through the file block before reaching the module block. This is exactly how name lookup works in Go, by the way.

The advantage of this scheme is that it adds no new concepts, only a new instance of an existing concept: the block. Furthermore, it enables a simpler and more efficient implementation. Instead of having to save an extra bit of information with every name in the module globals, as the Bazel change does, the sets of file-scoped vs globally scoped names are resolved entirely statically. The environment of file-scoped names can then be represented as a flat array indexed by small integers. The file-scoped names are never entered in the module dictionary, and access to them is more efficient.

You could even take this approach one step further and specify that top-level names starting with "_" create bindings in the file (not module) block. That would make access to these names more efficient too; also, the implementation of the load statement would not need to care about "_": those names would simply never be in the module dictionary.

brandjon commented 5 years ago

I think Alan's description is functionally equivalent to Laurent's (in effect, not in implementation), unless you consider a module-level binding construct other than =, def, or load(). The only one I can think of is module-level for statements, which aren't allowed in Bazel. If that case is allowed in any Starlark dialect, then for the sake of uniformity I would just treat it the same as any other non-load() binding construct.

I like the description of "exported-ness" as a simple consequence of splitting the module block into an outer publicly visible block and an inner locally visible block.

I don't think it's necessarily simpler to say that binding operations have different meaning depending on whether the name to be bound begins with an underscore. The alternative, that it's always the same block but the ones with an underscore are not made visible from that block, seems just as concise.

alandonovan commented 5 years ago

Yes, the distinction I was making affects the implementation and the programmatic API to the interpreter, but not the behavior observable to scripts. Sorry that wasn't clear.

laurentlb commented 5 years ago

I like the description of "exported-ness" as a simple consequence of splitting the module block into an outer publicly visible block and an inner locally visible block.

I agree. My implementation is messy due to the legacy code (we need to support both the old and the new behavior; we need to take care of existing users; the design of Environment.java makes it hard to implement changes as I originally wanted).

The focus was on the user-visible behavior (implementation can be cleaned up over time). I'm glad to see there's agreement on it.

Do you want to send a PR for the spec update?

alandonovan commented 5 years ago

Sure. Let me try an implementation in Go first, as code often uncovers surprises in our theories. :)

alandonovan commented 5 years ago

Go implementation and doc change at https://github.com/google/starlark-go/pull/178

ndmitchell commented 4 years ago

I was reading the spec and this problem in particular surprised me. @laurentlb - do you still have an intention to send a diff for the spec?

laurentlb commented 4 years ago

I'm happy to review a PR updating the spec.

ndmitchell commented 4 years ago

Looking closer, I think the spec would also need to nail down the behaviour of:

load("module", "a")
a = 1

Does that get exported or not?

laurentlb commented 4 years ago

It's not valid:

It is a static error to bind a global variable already explicitly bound in the file

alandonovan commented 4 years ago

@laurentlb It's more subtle than that.

a = 1 creates a global binding, but load(..., "a") creates a file-local binding, so the rule you quoted does not apply. Starlark/Java gives an error (cannot reassign global). Starlark/Go appears not to, at least in the context of herb, but that could be because herb sets various internal dialect options to try to match Bazel. But according to the spec it should: "it is an error for a load statement to bind the name of a global, or for a top-level statement to assign to a name bound by a load statement."

[Update: still no error from Starlark/Go after clearing the resolve.{LoadBindsGlobally,AllowGlobalReassign} options. Seems like a bug.]