Add grammar to permit type annotations

ndmitchell commented 4 years ago

Following up from https://github.com/bazelbuild/buildtools/issues/900, I think Starlark (or at least some extension thereof) should permit a grammar that allows inserting type annotations. Concretely (but informally) we should allow:

def myFunction(x : a_type) -> a_type:

Where a_type is any arbitrary expression. I think it would be useful to standardise where the types go, but not yet standardise what they mean or how they are evaluated. Some possibilities I can think of:

Something Python3 compatible. Potentially allowing tools like MyPy or Pyre to analyse ot.
Something more runtime checking, so they serve as assertions. I've experimented with this approach in a development branch, and while I can't share the code yet, it's been relatively effective.

Given that types restrict the meaning of a program, it should be perfectly compliant for "Standard Starlark" to parse but otherwise entirely ignore the type signatures. Starlark implementations could then have extensions that check (statically, as a lint, or at runtime) that the code is well-formed.

If there's general acceptance, I'm happy to put together a diff to the spec.

alandonovan commented 4 years ago

The addition of types is an interesting idea---the lack of types is the biggest disadvantage of working in a Python-like language---but it is an enormous undertaking, and raises all kinds of questions, such as:

Should dynamic checks be optional? If so, users can't rely on them for precondition checking, and nor can compiler optimizations.
What is the performance cost of additional type checks, and can this be mitigated by compiler optimization?
How would separate compilation work? Python, unlike typed languages, allows each file to be compiled without any knowledge of the values imported from other packages. Would that change? If so, it has enormous architectural ramifications for Bazel. If not, what limits does that place on the type system?
To what extent have types been a success in Python3? Which parts are good and which less so?
Core Starlark has no struct type; that's a Bazelism. How would the type checker be made extensible to support application-defined data types with their own type-checking rules?
Can types be added without compromising the core benefit of Starlark, namely that it is very simple?

I don't wish to dampen your enthusiasm, but I think it is very unlikely that the Starlark team will have significant time to spend advancing any type proposal, at least in the next year.

illicitonion commented 4 years ago

This sounds really exciting, particularly being able to re-use existing tools like MyPy/Pyre.

In addition to the syntax proposed, some interesting cases to consider:

Representing generics in some way; in Python3 this looks like foo: List[str] or bar: Callable[[str, str], int]. This goes slightly beyond type annotations being represented by "any arbitrary expression", as these are pretty far from existing expressions (though I guess the hack Python does of saying "You can put anything the language wouldn't understand in a string" could be a reasonable workaround, at least while we flesh out detail). MyPy supports metaclass type variables too, though I feel like that's maybe getting into compromising the simplicity of Starlark that Alan cites...
What level of expression should be annotatable? I'd argue that at least variables of any scope, and functions in any scope, should allow annotating. Sub-expressions... Unclear.

ndmitchell commented 4 years ago

To set the context, I have added optional dynamic types to a Starlark dialect. It works quite well. But it's just one possible design approach in a fairly large design space - so the details of my design aren't very interesting more than to highlight that there might be something interesting if people could explore this area of the spec. Unfortunately, until tools like buildifier at least parse Starlark with type signatures, the amount of exploration and experimentation people can do without forking all of the Starlark tools is quite limited. Therefore @laurentlb suggested I bring it up on the Starlark issue tracker first.

My proposed approach would be:

Add to buildifier (roughly now?).
Add the location and syntax of types to Starlark spec (in the near/medium term).
Let people experiment with how well types work, what form of types etc for a few years.
If some types work well, standardise them in Starlark. We might want to standardise a semantics for types (e.g. what they mean), but not how they are checked (e.g. runtime, static checker, linter).

To answer the technical questions: I imagine this would be optional. It would have no impact on runtime unless explicitly opted in to with an extension. It would not be used for optimisation. It would not impact loading. It would not interact with struct or similar. My experiment took ~200 lines to Hack in, and the resulting Starlark code was easier to read, write and debug (for me). Not a proof it doesn't take away the simplicity, but a suggestion it's not fatal.

type annotations being represented by "any arbitrary expression", as these are pretty far from existing expressions

Note that List[str] is a perfectly legit expression in Python and Starlark. It's the List value, indexed by the str value. Yes, Python types look pretty funky, but the Python language allows you to write [1+2] there as well - any arbitrary expression will do. See tfpdef which permits test (which is roughly any expression).

What level of expression should be annotatable?

Same as Python - parameter arguments and return types. We could allow PEP 526 if we wanted too. Allowing arbitrary expressions to be annotated feels like overkill (and has serious grammar ambiguity issues).

alandonovan commented 3 years ago

I imagine this would be optional. It would have no impact on runtime

If the types have no dynamic semantics, then my suggestion is that those who wish to experiment with types can embed the new syntax in comments or string literals, and write an external type-checking tool. This keeps all the complexity outside the core until we have extensive experience of the type system.

It would not interact with struct or similar.

This seems like a major lost opportunity. Structs (aka Provider instances in Bazel) are perhaps the most interesting case for types.

My experiment took ~200 lines to Hack in, and the resulting Starlark code was easier to read, write and debug (for me). Not a proof it doesn't take away the simplicity, but a suggestion it's not fatal.

I am less concerned about the complexity of the implementation than I am about the complexity of the spec changes.

laurentlb commented 3 years ago

Using docstrings is also my recommendation for now. Yes, it's more verbose (but it also encourages users to write documentation), but it seems safer to me as I'd rather not commit to a change in the grammar.

We already have some code to parse docstrings and the docstrings may provide additional information (e.g. for deprecation). If we consider that docstrings are parseable, you get the same value. You'll also be able to migrate the code from docstring annotations to type annotations, if we decide to accept it later.

https://github.com/bazelbuild/bazel/blob/master/src/tools/starlark/java/com/google/devtools/starlark/common/DocstringUtils.java https://github.com/bazelbuild/buildtools/blob/master/warn/warn_docstring.go

ndmitchell commented 3 years ago

I am less concerned about the complexity of the implementation than I am about the complexity of the spec changes.

I think it could be implemented with about 10 lines of spec change.

I'll consider whether to migrate my type annotations into doc strings or not. I definitely prefer them where Python 3 puts it types, but being able to interoperate with Buildifier is valuable.

alandonovan commented 3 years ago

I think it could be implemented with about 10 lines of spec change.

Sure, it's easy to make the grammar support "def f(id : expr)" in 10 lines, but expressing the rules for a workable type system is a huge task, and we don't want to open the door to all kinds of half-baked uses of this syntax that would create a clean-up problem when (if) we ever decide to support a type system.

nipunn1313 commented 3 years ago

Mypy itself uses comment-style type annotations for all < python 3.6 code. Grammar was introduced in python 3.6 https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html (search for "type comment") There are standard comment style type annotations supported by mypy!

Though it is a nice eventual goal to include support in the grammar, in terms of stability - it's probably best to follow mypy's strategy - proving success via type comments and then adding grammar later once there is confidence that the typing is successful and useful.

for transparency, I work at Dropbox, which uses both bazel and mypy extensively and has done much of a py2 -> py3 migration.

fmeum commented 1 month ago

A relevant proposal by @comius: https://docs.google.com/document/d/1Sid7EAbBd_w_T7D94Li_f_bK3zMTztFbzIMvcpzo1wY/edit

bazelbuild / starlark

Add grammar to permit type annotations #106