dart-lang / language

Design of the Dart language
Other
2.68k stars 205 forks source link

Escaped reserved words #271

Open munificent opened 5 years ago

munificent commented 5 years ago

This is a proposed solution for #270:

Evolving a programming language is always challenging in a number of ways. Users often want new features, but those features need syntax and adding syntax without breaking existing programs is difficult.

In particular, it is impossible to add new reserved words to the language. A reserved word, by definition cannot be used by users as an identifier. This means that if, say, Dart 2.x turns foo into a reserved word, then any existing program using foo as a variable name, type name, member name, import prefix, etc. breaks with a syntax error.

The typical way Dart and other languages avoid this is by never adding new reserved words. Instead, they add "contextual keywords" or "built-in identifiers". These are identifiers that behave like keywords in some contexts but can be used by users as normal identifiers in other places.

For example, show behaves like a keyword when used after an import or export directive:

import 'foo.dart' show something;

But it can also be used as an identifier:

class Widget {
  void show() { print("I am visible now."); }
}

In addition to not breaking existing programs, this has another advantages:

But it carries a number of disadvantages:

In other words, contextual keywords make the language bigger, more confusing, and harder to change. They technically preserve compatibility, but with a high tax.

A Model for Evolving the Language

In the past, most programming languages evolved with a policy of 100% backwards compatibility. That's great for, well, compatibility, but the trade-off is that the language gets monotonically more complex over time.

The increasing complexity means people now avoid C++ completely because it's simply too large for a new user to learn. If you didn't get on the C++ train a decade ago, it's very difficult to catch up. (See 1, 2.)

The other problem is that language features have to be compromised from their ideal form in the name of compatibility. For example, if we wanted to add non-nullable types in a non-breaking way, then we'd have to treat every existing type annotation as nullable, since that's what they mean today.

In order to get a non-nullable type, you'd need some explicit marker like !. But that's the wrong default. Empirical analysis shows something like 90% of variables are non-nullable, so forcing users to opt in to that only the majority of their types is a strictly worse feature.

To avoid that, Dart, Rust, and other languages are moving to a model where compatibility is preserved through a combination of opting in to new features and migration tooling. Requiring an opt in means existing code continues to work as it does today.

At the point that you opt in, you can also run a tool that changes your existing code to get it to a form that makes the most sense in the context of the new feature. With non-nullable types, that lets us make non-nullable the default, leading to cleaner code post-migration without piles of pointless !. It's even theoretically possible to have migrations that purely remove deprecated features, giving us a way to simplify the language over time by removing functionality that no longer carries its weight.

This model generally works well for syntax changes, but one area where it breaks down is when the migration tooling would change the public API of a library. At that point, a user can't freely opt in to the change because it forces them to break their existing users.

An example that gets to the point of this proposal is reserving a new word. Let's say we want to turn async into a fully reserved word. We could write a tool that found any existing uses of async as an identifier and re-wrote them to something like myAsync. The resulting code now no longer has syntax errors. But if those identifiers are in public members, any library importing the migrated one are broken. In other words, migration isn't encapsulated.

Escaping Reserved Words

This proposal solves that for reserved words by providing a syntax that lets you explicitly use any reserved word (new or old) as an identifier. We borrow a feature from Swift and allow a backticks around any reserved word or identifier:

var `for` = "a variable named 'for'";

This provides two main benefits:

We have a general goal of making the language easier to evolve, and this feature would give us one small mechanism to let us evolve the set of keywords in a mechanically-migratable way.

leafpetersen commented 5 years ago

To clarify, is the model that:

munificent commented 5 years ago

That's right. There's a level of separation here where this particular proposal is independent any sort of opt in, migration, or new reserved words. It just says if you want to, say, name a variable "for", here's the syntax to do it.

That in turn happens to be a nice affordance because then if we want to reserve a new word, we can do so, by:

  1. Define an opt in to use the new reserved word as a keyword.
  2. Provide a tool to migrate existing code such that it can be opted in without breaking it or any code that calls it.

This feature makes it possible to implement step 2 because any existing uses of the new keyword can simply be wrapped in backticks and everything continues to work as before.

lrhn commented 5 years ago

There are precedence (ES6) for allowing unquoted reserved words after a .. We probably won't do that because it would prevent us from doing postfix await as foo.await.

Other options for escaping could be:

var \escaped = o.\escaped;
var `escaped = o.`escaped;
var #escaped = o.#escaped;

The backslash works like JavaScript before the above mentioned feature, where o.\if worked in most implementations (contrary to what the spec actually said, but it was rather vague). The ` is similar to Scheme-like symbol escapes, but probably not a good match for Dart. The # is reminiscent of a symbol, which is already way to mention a source name. Since the following is always an identifier, there is no problem not having an end delimiter. Using a symbol literal would keep us inside the existing grammar, just allowing identifer | symbolLiteral in some places (we'd obviously have to check that we won't introduce ambiguities that way).

Using something else would keep `-quoted strings as an option in the grammar (and single-` quoting would still prevent that).

mit-mit commented 5 years ago

Just my 5-cents: I found ` familiar because of it's use in markdown.

yjbanov commented 5 years ago

@mit-mit markdown familiarity might actually be an issue. Backticks appearing in code snippets inside dartdocs (which are markdown) could interfere with the syntax.

yjbanov commented 5 years ago

Having said that, I do like the backticks in code.

munificent commented 5 years ago

Since the following is always an identifier, there is no problem not having an end delimiter.

A minor point, but if we ever want to use this as a feature to enable interop with other languages that have different identifier lexical rules, having an end delimiter might be useful:

dom.css.`background-color` = red;

Using a symbol literal would keep us inside the existing grammar, just allowing identifer | symbolLiteral in some places (we'd obviously have to check that we won't introduce ambiguities that way).

If you want to use an escaped identifier to call a getter on the implicit this, then using symbol literal syntax collides with using a symbol literal as an expression:

class Foo {
  get #for => "weird, but whatever";
  baz() {
    #for // Symbol literal or getter call on this?
  }
}

@mit-mit markdown familiarity might actually be an issue. Backticks appearing in code snippets inside dartdocs (which are markdown) could interfere with the syntax.

That's a good point, though I imagine cases where you want some inline code in Markdown to reference an escaped Dart identifier are rare. When you do need to, Markdown has a way to escape the backticks inside inline code.

lrhn commented 5 years ago

If we allow "quoted identifiers" that are not just identifiers or reserved words, then we need something more, and an end quote character is a good idea.

If we do that, would we want to do it from the beginning, and allow any string as an "identifier":

int get `something or other` => 42;
void `throw 💩`() => throw "💩";

It seems ridiculous and exploitable, but it would also allow non-ASCII identifiers like

void get `blåbærgrød` => getBerries().boil();

so it might get used.

Hixie commented 5 years ago

I'm not a fan of backticks because several languages (e.g. bash, perl) use it for other purposes. Backslash seems pretty reasonable, though it doesn't let you use any arbitrary string. If we go with something like backticks, would escapes be allowed within the sequence?

   var `foo\nbar\`baz` = 2;
rakudrama commented 5 years ago

Don't forget that identifiers are used in various doc-comment contexts , so I would suggest finding something that works well with markdown, i.e. inside back-ticks and [] references.

Idle question - why is blåbærgrød not already an identifier?

lrhn commented 5 years ago

@rakudrama

Idle question - why is blåbærgrød not already an identifier?

Are you asking how the specification defines an identifier so it doesn't include blåbærgrød, or why we have not yet changed the spec to allow it?

Dart identifiers must be ASCII only. In fact, non-ASCII characters can only occur in Dart code inside strings or in comments.

Some languages, like Java and JavaScript, allow Unicode identifiers. That obviously has a cost for parsing (and security), but allows words in other languages to be used as source code identifiers. Other languages, like C++, do not. Dart has so far chosen not to go there.

DanTup commented 5 years ago

F# uses backticks, but they have to be doubled:

let ``let`` = 75

I'm not sure if that makes markdown escaping easier or harder though 😄

var \escaped = o.\escaped;

I think using blackslashes or only having a marker at the start is a bit confusing since it looks like it might just be escaping a character rather than defining the whole word.

Also, I suspect it's not a goal here, but in F#'s double-backticks you can use characters that aren't normally valid, like spaces (and I think other punctuation):

let ``test that some thing happens with the thing``() = ...

Doing that raises other questions (like how to call from un-opted code), but it was a nice feature there to avoid munging descriptions into names_like_this. I'm not sure if it was often used outside of tests though, and Dart has its own way of handling them.

MisterJimson commented 5 years ago

Just wanted to chime in and note that C# uses an @ for this and I feel it works pretty well.

    public class @class
    {
        public int age;
    }
    class Program
    {
        static void Main(string[] args)
        {
            @class p1 = new @class();
            p1.age = 10;
            Console.WriteLine("Age: "+p1.age);
            Console.WriteLine("Press Enter Key to Exit..");
            Console.ReadLine();
        }
    }
Dokotela commented 4 years ago

Thanks so much for working on this! I'm pretty new, but I've enjoyed working in dart so far. I'm working with some complex json currently, and it uses 'class', 'list', 'extends', 'for', and 'assert' as variable names, so I'm looking forward to being able to escape them at some point in the future. In the meantime, any suggestions on working with them?

munificent commented 4 years ago

In the meantime, any suggestions on working with them?

I would just pick a convention to tweak the name so that it doesn't collide with the reserved word. A trailing underscore would work, but looks funny in Dart where leading underscores are meaningful.

Note that unlike in JavaScript, JSON keys in Dart must always be quoted strings, not bare identifiers, so you won't have any collisions there.