bazelbuild / starlark

Starlark Language
Apache License 2.0
2.43k stars 160 forks source link

Add grammar to permit type annotations #106

Open ndmitchell opened 4 years ago

ndmitchell commented 4 years ago

Following up from https://github.com/bazelbuild/buildtools/issues/900, I think Starlark (or at least some extension thereof) should permit a grammar that allows inserting type annotations. Concretely (but informally) we should allow:

def myFunction(x : a_type) -> a_type:

Where a_type is any arbitrary expression. I think it would be useful to standardise where the types go, but not yet standardise what they mean or how they are evaluated. Some possibilities I can think of:

Given that types restrict the meaning of a program, it should be perfectly compliant for "Standard Starlark" to parse but otherwise entirely ignore the type signatures. Starlark implementations could then have extensions that check (statically, as a lint, or at runtime) that the code is well-formed.

If there's general acceptance, I'm happy to put together a diff to the spec.

alandonovan commented 4 years ago

The addition of types is an interesting idea---the lack of types is the biggest disadvantage of working in a Python-like language---but it is an enormous undertaking, and raises all kinds of questions, such as:

I don't wish to dampen your enthusiasm, but I think it is very unlikely that the Starlark team will have significant time to spend advancing any type proposal, at least in the next year.

illicitonion commented 4 years ago

This sounds really exciting, particularly being able to re-use existing tools like MyPy/Pyre.

In addition to the syntax proposed, some interesting cases to consider:

ndmitchell commented 4 years ago

To set the context, I have added optional dynamic types to a Starlark dialect. It works quite well. But it's just one possible design approach in a fairly large design space - so the details of my design aren't very interesting more than to highlight that there might be something interesting if people could explore this area of the spec. Unfortunately, until tools like buildifier at least parse Starlark with type signatures, the amount of exploration and experimentation people can do without forking all of the Starlark tools is quite limited. Therefore @laurentlb suggested I bring it up on the Starlark issue tracker first.

My proposed approach would be:

  1. Add to buildifier (roughly now?).
  2. Add the location and syntax of types to Starlark spec (in the near/medium term).
  3. Let people experiment with how well types work, what form of types etc for a few years.
  4. If some types work well, standardise them in Starlark. We might want to standardise a semantics for types (e.g. what they mean), but not how they are checked (e.g. runtime, static checker, linter).

To answer the technical questions: I imagine this would be optional. It would have no impact on runtime unless explicitly opted in to with an extension. It would not be used for optimisation. It would not impact loading. It would not interact with struct or similar. My experiment took ~200 lines to Hack in, and the resulting Starlark code was easier to read, write and debug (for me). Not a proof it doesn't take away the simplicity, but a suggestion it's not fatal.

type annotations being represented by "any arbitrary expression", as these are pretty far from existing expressions

Note that List[str] is a perfectly legit expression in Python and Starlark. It's the List value, indexed by the str value. Yes, Python types look pretty funky, but the Python language allows you to write [1+2] there as well - any arbitrary expression will do. See tfpdef which permits test (which is roughly any expression).

What level of expression should be annotatable?

Same as Python - parameter arguments and return types. We could allow PEP 526 if we wanted too. Allowing arbitrary expressions to be annotated feels like overkill (and has serious grammar ambiguity issues).

alandonovan commented 3 years ago

I imagine this would be optional. It would have no impact on runtime

If the types have no dynamic semantics, then my suggestion is that those who wish to experiment with types can embed the new syntax in comments or string literals, and write an external type-checking tool. This keeps all the complexity outside the core until we have extensive experience of the type system.

It would not interact with struct or similar.

This seems like a major lost opportunity. Structs (aka Provider instances in Bazel) are perhaps the most interesting case for types.

My experiment took ~200 lines to Hack in, and the resulting Starlark code was easier to read, write and debug (for me). Not a proof it doesn't take away the simplicity, but a suggestion it's not fatal.

I am less concerned about the complexity of the implementation than I am about the complexity of the spec changes.

laurentlb commented 3 years ago

Using docstrings is also my recommendation for now. Yes, it's more verbose (but it also encourages users to write documentation), but it seems safer to me as I'd rather not commit to a change in the grammar.

We already have some code to parse docstrings and the docstrings may provide additional information (e.g. for deprecation). If we consider that docstrings are parseable, you get the same value. You'll also be able to migrate the code from docstring annotations to type annotations, if we decide to accept it later.

https://github.com/bazelbuild/bazel/blob/master/src/tools/starlark/java/com/google/devtools/starlark/common/DocstringUtils.java https://github.com/bazelbuild/buildtools/blob/master/warn/warn_docstring.go

ndmitchell commented 3 years ago

I am less concerned about the complexity of the implementation than I am about the complexity of the spec changes.

I think it could be implemented with about 10 lines of spec change.

I'll consider whether to migrate my type annotations into doc strings or not. I definitely prefer them where Python 3 puts it types, but being able to interoperate with Buildifier is valuable.

alandonovan commented 3 years ago

I think it could be implemented with about 10 lines of spec change.

Sure, it's easy to make the grammar support "def f(id : expr)" in 10 lines, but expressing the rules for a workable type system is a huge task, and we don't want to open the door to all kinds of half-baked uses of this syntax that would create a clean-up problem when (if) we ever decide to support a type system.

nipunn1313 commented 3 years ago

Mypy itself uses comment-style type annotations for all < python 3.6 code. Grammar was introduced in python 3.6 https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html (search for "type comment") There are standard comment style type annotations supported by mypy!

Though it is a nice eventual goal to include support in the grammar, in terms of stability - it's probably best to follow mypy's strategy - proving success via type comments and then adding grammar later once there is confidence that the typing is successful and useful.

for transparency, I work at Dropbox, which uses both bazel and mypy extensively and has done much of a py2 -> py3 migration.

fmeum commented 1 month ago

A relevant proposal by @comius: https://docs.google.com/document/d/1Sid7EAbBd_w_T7D94Li_f_bK3zMTztFbzIMvcpzo1wY/edit