dart-lang / language

Design of the Dart language
Other
2.65k stars 202 forks source link

Eliminate symbol literals with several identifiers? #3659

Open eernstg opened 6 months ago

eernstg commented 6 months ago

Thanks to @modulovalue for bringing up this topic in https://github.com/dart-lang/language/issues/3749.

The Dart grammar has an obvious ambiguity in that #id.id can be parsed as a symbol literal #id.id (which is a symbol whose corresponding string is a sequence of identifiers separated by dots, which can be used to denote a library name), and it can be parsed as a symbol literal #id followed by a getter invocation .id.

This is a genuine ambiguity, not just a limitation of a specific parsing algorithm (no amount of lookahead would be able to rule out any of those two choices as a matter of syntax). We wouldn't even be able to rule out any of those two choices if we were able to use static semantic information in a program where id is an extension getter on Symbol returning dynamic.

I'd recommend that we eliminate the support for these symbol literals containing a dot separated sequence of identifiers from the language, thus eliminating this syntactic ambiguity entirely.

Note that symbol literals of the form #id1.id2. ... .idk can be used to denote library names, but library names are only used rarely today, and they may just as well be denoted by const Symbol('id1.id2. ... .idk'). In other words, this is expected to be a change which is breaking in principle, but essentially non-breaking in practice, and every breakage can trivially be resolved locally at the broken expression.

I've labeled this as a 'technical-debt' issue because the ambiguity in the grammar ought to be eliminated (and also because we don't have a label for "simplify-the-language" ;-).

@dart-lang/language-team, WDYT?

lrhn commented 6 months ago

I thought the ambiguity was handled by precedence, with continuing the symbol literal having higher precedence than property accesses. If so, the problem is handled, and if you really want to do #foo.toString(), you'll have to do (#foo).toString(). And that's fine, because ... like, why? And like, don't!

There are other uses for #foo.bar.baz in dart:mirrors. A "qualified name" like #library_name.ClassName.memberName can be used to denote a declaration. That has always been questionable, because that's not a source name, and symbols are supposed to represent source names. And it doesn't work with private names, because #library_name._private_name doesn't name-mangle to a private name in that library. (I have no idea what the qualifiedName getter returns for a private declaration.) Qualified names in dart:mirrors should probably be avoided in general, and forcing them to use const Symbol(...) instead shouldn't change anything.

Also, the specification is already that #_foo.bar and #foo._bar are unaffected by library privact, so using the Symbol constructor should be sufficient. The only real use for symbols in the language is for noSuchMethod and Function.apply.

So, LGTM.

munificent commented 6 months ago

I'd recommend that we eliminate the support for these symbol literals containing a dot separated sequence of identifiers from the language, thus eliminating this syntactic ambiguity entirely.

Agreed. Kill it with fire.

modulovalue commented 6 months ago

I would like to throw an alternative approach for removing this ambiguity into the room, which is to replace (and free up!) the #... syntax with some syntactically sane (#<<<<#<<<#<<<<#<< can be made a valid expression with extensions) symbol literal syntax like, for example, #"...".

There are other minor inconsistencies between the spec and the implementation.

(see https://github.com/dart-lang/sdk/issues/50776 for more info, where @eernstg provides some more context)

For example:

var x = #[ ] =;
 === pkg:analyzer (https://pub.dev/packages/analyzer) ===
Parsing failed with some errors:
Scan errors: 0
Parse errors: 3
file(9..9): Expected an identifier.
file(11..11): Expected an identifier.
file(14..14): Expected an identifier.
<CompilationUnitImpl> [0-15]
┗━ <TopLevelVariableDeclarationImpl> [0-15]
  ┣━ <VariableDeclarationListImpl> [0-14]
  ┃  ┣━ 'var' [0-3]
  ┃  ┗━ <VariableDeclarationImpl> [4-14]
  ┃    ┣━ 'x' [4-5]
  ┃    ┣━ '=' [6-7]
  ┃    ┗━ <AssignmentExpressionImpl> [8-14]
  ┃      ┣━ <IndexExpressionImpl> [8-12]
  ┃      ┃  ┣━ <SymbolLiteralImpl> [8-9]
  ┃      ┃  ┃  ┣━ '#' [8-9]
  ┃      ┃  ┃  ┗━ '' [9-9]
  ┃      ┃  ┣━ '[' [9-10]
  ┃      ┃  ┣━ <SimpleIdentifierImpl> [11-11]
  ┃      ┃  ┃  ┗━ '' [11-11]
  ┃      ┃  ┗━ ']' [11-12]
  ┃      ┣━ '=' [13-14]
  ┃      ┗━ <SimpleIdentifierImpl> [14-14]
  ┃        ┗━ '' [14-14]
  ┗━ ';' [14-15]
--------------------------------------------------------------------------------
 === DSP (https://github.com/dart-lang/sdk/blob/master/tools/spec_parser/dart_spec_parser/Dart.g4) dspVersion v0.41 ===
Parsing succeeded with no errors.
Errors of type 1: []
Errors of type 2: []
<startSymbol>
┗━ <libraryDefinition>
  ┣━ <metadata>
  ┣━ <topLevelDefinition>
  ┃  ┣━ <varOrType>
  ┃  ┃  ┗━ 'var'
  ┃  ┣━ <identifier>
  ┃  ┃  ┗━ 'x'
  ┃  ┣━ '='
  ┃  ┣━ <expression>
  ┃  ┃  ┗━ <conditionalExpression>
  ┃  ┃    ┗━ <ifNullExpression>
  ┃  ┃      ┗━ <logicalOrExpression>
  ┃  ┃        ┗━ <logicalAndExpression>
  ┃  ┃          ┗━ <equalityExpression>
  ┃  ┃            ┗━ <relationalExpression>
  ┃  ┃              ┗━ <bitwiseOrExpression>
  ┃  ┃                ┗━ <bitwiseXorExpression>
  ┃  ┃                  ┗━ <bitwiseAndExpression>
  ┃  ┃                    ┗━ <shiftExpression>
  ┃  ┃                      ┗━ <additiveExpression>
  ┃  ┃                        ┗━ <multiplicativeExpression>
  ┃  ┃                          ┗━ <unaryExpression>
  ┃  ┃                            ┗━ <postfixExpression>
  ┃  ┃                              ┗━ <primary>
  ┃  ┃                                ┗━ <literal>
  ┃  ┃                                  ┗━ <symbolLiteral>
  ┃  ┃                                    ┣━ '#'
  ┃  ┃                                    ┗━ <operator>
  ┃  ┃                                      ┣━ '['
  ┃  ┃                                      ┣━ ']'
  ┃  ┃                                      ┗━ '='
  ┃  ┗━ ';'
  ┗━ '<EOF>'

The implementation does not support whitespace there, but the spec appears to do. This is not a big deal, but what for? All of this is unnecessary complexity for, in my opinion, very little benefit.

Giving symbols a distinct literal that is properly terminated on the lexical level would be easier to parse, easier to specify and easier to understand all while supporting the old #id.id.id use cases as, e.g., #"id.id.id" and it could support interpolations to make it a complete replacement for the Symbol constructor (#"id.${my_id}.id").


Personally, I really dislike the # symbol for representing Symbols. C uses # to switch to what could be considered a "meta" level and I really like that idea. Dart could reserve # for similar uses in the future and replace symbol literals with, for example, s"id" or something like an instance of a tagged string (https://github.com/dart-lang/language/blob/main/working/tagged-strings/feature-specification.md).

munificent commented 6 months ago

Personally, I really dislike the # symbol for representing Symbols.

Agreed. I would remove symbol literals from the language entirely if it were up to me. They don't carry their weight.

lrhn commented 6 months ago

If it was only mirrors which needed symbols, I'd be happy to remove the syntax entirely, and have you to do MirrorSystem.getSymbol("_foo", libraryMirror) to get a private name's symbol.

Function.apply only needs public symbols, since named parameters cannot have private names. It's also so rare that using const Symbol("a") is probably fine. (I doubt most people even remember that Function.apply exists.)

The biggest issue is noSuchMethod and its Invocation.memberName, which can be a private member's name, and you may want to compare against it, even on platforms that don't have dart:mirrors. For that, we do need something (and not just strings.)

Or we could make it impossible to get noSuchMethod invocations for private names, even when they are in scope. (We did that for private names that were not in scope, making an implicit implementations into "nSM-throwers" instead of "nSM-forwarders".) And we could also make structurally or type-wise invalid dynamic invocations throw, instead of invoking noSuchMethod, so the only way to get a noSuchMethod invocation is through an unimplemented public interface member's nSM-forwarder. Nothing dynamic, that's just errors. (Currently a class Foo {noSuchMethod(i) => ...; void foo(int x);} will have (Foo() as dynamic).foo("A") throw a type error, but (Foo() as dynamic).foo("A", 0) invoke noSuchMethod. Good times! Much consistent!)

If we did that, then no user code outside of dart:mirrors would need private symbols, so we could remove the syntax and defer to const Symbol("foo") and const Symbol("[]=") for everything. (And get # free for other uses. I call dibs on code units literals, #"a" == 0x61! Surely everybody needs that!)

One other, non-essential, use of symbols is as unique non-forgable sentinel values. I use them, fx, as the keys for zone values. Any code in the same library can just write #_zoneKey as a constant, without needing to create a new class to make a unique instance of. Don't know if there is something we can do to help with that. The workaround is just enum _Token { token; } and using _Token.token as the unique, unforgable value.

modulovalue commented 6 months ago

The biggest issue is noSuchMethod and its Invocation.memberName, which can be a private member's name, and you may want to compare against it, even on platforms that don't have dart:mirrors. For that, we do need something (and not just strings.)

@lrhn I don't understand. I thought Symbol literals are just syntax sugar for invoking a constant Symbol constructor? That is, for all x, where #x is supported, identical(#x, const Symbol('x')) is always true?

Are there any limitations that prevent the constant Symbol constructor from accepting string literals that represent private member names? Wouldn't invoking the constant Symbol constructor be a complete replacement for symbol literals?

stereotype441 commented 6 months ago

If it was only mirrors which needed symbols, I'd be happy to remove the syntax entirely, and have you to do MirrorSystem.getSymbol("_foo", libraryMirror) to get a private name's symbol.

Function.apply only needs public symbols, since named parameters cannot have private names. It's also so rare that using const Symbol("a") is probably fine. (I doubt most people even remember that Function.apply exists.)

The biggest issue is noSuchMethod and its Invocation.memberName, which can be a private member's name, and you may want to compare against it, even on platforms that don't have dart:mirrors. For that, we do need something (and not just strings.)

FWIW, even if we remove # syntax, it's still possible to construct a Symbol that refers to a private name (but you can't do it in a const context):

class SymbolFactory {
  const SymbolFactory();
  Symbol noSuchMethod(Invocation invocation) {
    if (invocation.isGetter) return invocation.memberName;
    return super.noSuchMethod(invocation);
  }
}
const dynamic symbolFactory = SymbolFactory();
final symbol_foo = symbolFactory._foo;
main() {
  print(#_foo == symbol_foo); // prints `true`.
}
stereotype441 commented 6 months ago

The biggest issue is noSuchMethod and its Invocation.memberName, which can be a private member's name, and you may want to compare against it, even on platforms that don't have dart:mirrors. For that, we do need something (and not just strings.)

@lrhn I don't understand. I thought Symbol literals are just syntax sugar for invoking a constant Symbol constructor? That is, for all x, where #x is supported, identical(#x, const Symbol('x')) is always true?

Are there any limitations that prevent the constant Symbol constructor from accepting string literals that represent private member names? Wouldn't invoking the constant Symbol constructor be a complete replacement for symbol literals?

The issue is that since Dart uses library-scoped privacy, #_foo in one library is not the same as #_foo in another library. Whereas Symbol('_foo'), being a const expression, ought to produce the same result regardless of what library you invoke it from.

lrhn commented 6 months ago

FWIW, even if we remove # syntax, it's still possible to construct a Symbol that refers to a private name

True, but if we remove invoking noSuchMethod for private names, even if accessible, the need and ability would both go away.

natebosch commented 6 months ago

I don't think losing private name forwarding to noSuchMethod would be too disruptive. It does look like the #_uniqueKey pattern is used often enough that we'd want to have an automated fix if we remove that syntax.

lrhn commented 4 months ago

Agree on the #_uniqueKey being useful, because it's an unforgeable constant sentinel that you don't have to name, and don't have to come up with a unique nonce for.

Defining a private class _UniqueKey{ const _UniqueKey(); } and using const _UniqueKey() works too, but is verbose.

Just class _UniqueKey{} and using the Type object _UniqueKey would probably work, if you don't leak it. Same for void _uniqueKey(){} and using the function object _uniqueKey. Or enum _unique{key} and using _unique.key. All slightly less convenient than just #_uniqueKey. (But the enum can be reused if you need more than one unique object in the same library.)

Reusing a public class UniqueKey{final _id; const UniqueKey(this.id); } means coming up with an unforgeable ID object, which you could just use instead then.

I think I'd go with an enum as the fix, if we remove private symbols entirely from outside of dart:mirrors.

lrhn commented 4 months ago

So, here's a full proposal:

No symbol literals - Proposal (quite breaking)

We remove symbol literals from the language syntax. That also removes the ability to create privately named symbols (without dart:mirrors), so we remove the need for privately named symbols, and refer all other symbols to the const Symbol constructor or Symbol static constants.

The Symbol constructor allows any string as argument, which becomes the state of the object. Constant symbols are canonicalized to the same object if they take the same string argument.

BREAKING CHANGE: Symbol objects are only equal if they are identical. (That is: Symbols created using new are not equal to each other or to symbols created using const. The Symbol class will actually have primitive equality!) The symbols of Invocations passed to noSuchMethod by noSuchMethod-forwarders are constants, and the symbols needed by Function.apply to name named parameters must be constants. This change to equality means that Symbol objects don't actually need to store the string, as long as constants are still properly canonicalized and reused. They can choose to retain a string for toString when debugging, but don't have to. It also means that new Symbol(...) is pretty much useless, so we'll probably warn against it.

BREAKING CHANGE: Dynamic invocations no longer invoke noSuchMethod on failure to find a matching member, they throw a NoSuchMethodError directly. Only successfully called noSuchMethod-forwarders call noSuchMethod, which means that to forward a call to noSuchMethod, it must be a valid interface member invocation to begin with. A missing privately named member no longer gets a noSuchMethodForwarder, even if the name is accessible where the class is declared. All privately named members that are not implemented in a concrete class that has overridden noSuchMethod, gets a noSuchMethod-thrower instead, just like inaccessible ones already do. (It's inside the same library, you can just implement the member if that's a problem!)

This means that a dynamic privately named member invocation never needs to create an Invocation with a privately named symbol as memberName. Named parameters can already not be private. By not having privately named nSM-forwarders, there is no longer any need for private symbols.

Consequences

No symbol literals, the # character is free! (Dibs on #'a' == 0x61!)

No private symbols at all. (Outside of dart:mirrors, which still need to name private declarations. They can be created from a LibraryMirror. May want to ensure that non-private symbols created using dart:mirrors use canonicalized symbols, so they are equal to constant symbols, even if it means having a runtime registry of all the constant symbols of the program, or a way to cheat with ==. Private symbols can just use a subclass of Symbol, or store a secondary value in Symbol that is not settable by the public constructors. Private symbols won't be constants. All this only affects the VM when dart:mirrors is imported by the program.)

When new Symbol("a") != const Symbol("a"), compilers don't need to retain source names to figure out which string corresponds to the minified name for a. Meaning no reason to ever call new Symbol. We can't disallow it, but we can discourage it strongly. (Or we could disallow it, the same way some compilers disallow new String.fromEnvironment(...).)

Code that actually uses symbols must use const Symbol(...). If that's too cumbersome, I'm sure someone will just do

typedef _ = Symbol;
const $myName = _('myName'), $myName2 = _('myName2'), ..., $someOtherName = _('someOtherName');

and import that. Or have a macro to generate it for them:

@symbols(prefix: "$", ["myName", "myName2", ..., "someOtherName"]);
library;

It's just a finite number of constants, and won't be used much.

A dynamic invocation can only successfully invoke public interface methods of the receiver, and noSuchMethod is only ever called by nSM-forwarders. The (now completely inaccurately named) noSuchMethod can only be used to implement the actual interface of the object, not extend it with behavior outside of that interface. (I'm sure some compilers will be happy about that.)

Migration

Existing code that uses public symbols, #foo, #>> or #foo.bar.baz. are converted to const Symbol('foo'), const Symbol('>>') or const Symbol('foo.bar.baz') respectively.

Existing code that uses a private symbol: #_foo, and does not import dart:mirrors:

Any code that imports dart:mirrors and uses a private symbol, #_foo,

Any code that uses noSuchMethod to implement methods that are not defined by the interface of the underlying object, will stop working. No migration, just stop doing that.

tatumizer commented 4 months ago

What are the use cases for noSuchMethod? It's so slow, makes me wonder in what scenarios being that slow is OK. As soon as you impose extra restrictions on noSuchMethod (as proposed), limiting it to handling just "known" methods, won't it be easier to generate these missing methods with metaprogramming?

lrhn commented 4 months ago

Use-cases for noSuchMerhod is mocking, basically.