Closed pq closed 1 year ago
My first guess is that we're parsing the record type as an argument list for the annotation. Not sure whether the proposal says how we should disambiguate this case.
@munificent
Ouch, that is an ambiguity. Not sure there is any good solution.
@foo
(int, int) Function(int) bar;
is completely ambiguous.
There is no simple disambiguation. We can't add something to @constantName
to show that it doesn't want the argument.
We can't add something to (int, int) Function(int)
to show that it does want the (otherwise optional) return type.
The best hacks would be:
@foo
@DUMMY()
(int, int) Function(int) x;
or
typedef typeof<T> = T;
@foo
typeof<(int, int) Function(int)> x;
Neither is something we can reasonably ask people to do.
@stereotype441
@leafpetersen
I guess one (unfortunate) option is to disambiguate based on space. @foo(...)
is an annotation constructor call, @foo (...)
is an annotated record type. We'd want to be sure that dartfmt didn't change the spacing in that case though.
Moving this over to the language repo because this is an ambiguity in the actual language feature and not a parser bug. We'll have to change the proposal to handle this. :(
One idea is to put Record
at the beginning of record types, like:
class C {
@deprecated
Record(int, int) pair;
}
@foo
Record(int, int) Function(int) bar;
I don't like the verbosity, but it's consistent with function type syntax and resolves this ambiguity. It also makes the zero-element record type Record()
instead of ()
record, which I kind of like because the latter feels a little terse and opaque for a type annotation.
I'll think about this some more and see if I can come up with any other ideas.
So how about them types on the right?
Another (probably bad) idea: We could extend the language to support named type parameters. That could arguably be useful for other user-defined generic types because (as with optional named parameters) it would let use sites omit the type parameters can be inferred/defaulted and write only the ones the user cares about.
If we had that, then record types (and in principle function types) could look like Record<int, String s>
.
But given that we already have function types which use parentheses and users seem to be OK with, this feels like the wrong approach.
So we can either change the record type grammar, or the annotation grammar, or add something that users can use to disambiguate in cases where there is ambiguity.
The things we can do without changing the grammar are annoyances to users, and pretty much undiscoverable (like the typeof<T>
wrapper).
We could introduce <type>
as grouping parentheses for types, without the leading typeof
, so you can do @foo <(a, b)> bar(){}
and have (a, b)
be a type instead of a parameter list. Will probably cause other ambiguities instead, though.
We can change the annotation grammar. Lots of options, all of them breaking and requiring migration, but probably something that can be automated.
We could decide to disallow annotations without parentheses. It would be slightly annoying for a few annotations. Mainly @override
, which should be a language feature anyway.
And probably a bunch of others that I don't know because I never use them anyway.
We could change the annotation grammar entirely to use, say, the C#-style: [expression]
instead of @expression
.
If we are changing something, we might as well change everything 😁 .
Or we can introduce some sort of delimiter you can put after annotations to separate them from the following thing. Trailing comma? Optional semicolon after annotations? (Giving people optional semicolons, just not the ones they want!)
Changing the record type grammar is less breaking because it doesn't exist yet, so we won't break any code. It's just so. darn. simple. right now, and any change will make it worse.
Options like Record(current record type)
or #(current record type)
disambiguate a lot. They're also verbose and somewhat unintuitive. The #
is better than Record
, mainly from being shorter, and it won't conflict with symbols. Both are better than using Record<...>
, that looks too much like a generic thing. I'd use the same rules as for Function
: if followed by (
it's parsed specially, otherwise it's just the type Record
itself.
Also, then the type won't look like the destructuring syntax, or the literal syntax.
FWIW, I actually find Record(int, {String b})
to be intuitive, and I especially like the symmetry with function types because the two kinds of types have a lot of similarity.
I think my favorite of the suggestions above is using Record(int, {String b})
to denote a record type. I really like the similarity to what we do for function types.
One other random idea: add an optional :
to the end of the annotation syntax (which maybe we make required in Dart 3.0). Then the user disambiguates via the placement of the colon, e.g.:
@deprecated:
(int, int) pair;
vs.
@Deprecated('message'):
foo;
If the user leaves off the :
we should assume the thing in parentheses is an annotation argument, for backwards compatibility.
FWIW, I lean towards Record(int, {String b})
for record types too and agree that the symmetry with functions is nice.
I like the Record(actualRecordType)
syntax on principle, it's consistent, readable and easy to parse, but I worry it is too verbose.
When you want to use a record type, it's in a place where you don't want the ceremony of introducing a class. Adding extra verbosity to those places makes it less convenient, and might just remove some of the lure of having records. (Not all, but some.) Or maybe I'm over-thinking it.
Looking on the motivation in the specification, I see three reasons: (1) no behavior, (2) verbosity, and (3) coupled to a class definition.
Surprisingly, I don't see performance, i.e. I'd expect that because we don't need allocating a new object instance, and then GC it, and instead just return a few values on the stack, it is slightly faster.
I don't understand (3) as well. What is wrong with being coupled to a class name?
For (2) I think the biggest issue is not verbosity itself, but non-locality. I have to scroll down to the bottom of the file to declare a new class that I will return from this method. Not a huge work, I admit, but still. Typing Record
is still more lightweight than scrolling. Although of course adds some verbosity.
And I also like the symmetry with Function
.
I'll admit, I also find Function
too verbose.
And I can't even typedef my way around it, because even with typedef Fn = Function
, the int Function(int)
syntax is a special syntactic form unrelated to the similarly named class, so the alias doesn't apply.
Compare | Verbose | Concise | Overly Brief? |
---|---|---|---|
Record(int, int) | Rd(int, int) | #(int, int) | |
int Function(int) | int Fn(int) | (int)->int |
I think a point could be made for shorter keywords. :)
Could you make the word Record
optional perhaps? Allowing both Record(int, int)
and the shorter (int, int)
? The latter would probably be used in most cases, but if you needed to precede it with an annotation (or some other ambiguous case comes up down the line) or you just wanted to be more explicit, you could write out the full Record
.
And I can't even typedef my way around it, because even with
typedef Fn = Function
, theint Function(int)
syntax is a special syntactic form unrelated to the similarly named class, so the alias doesn't apply.
Could we fix this?
I prefer the more concise syntax for records types. I agree with @lrhn that already Function
is a bit verbose.
Compare:
Record(String, String) doSomething(Record(String, String) Function(Record(int, int), String Function(int)) transform) {
return transform(((1, 2), (i) => '$i'));
}
with the current (with allowing Fn)
(String, String) doSomething((String, String) Fn((int, int), String Fn(int)) transform) {
...
}
Or:
doSomething(transform: ((int, int), int -> String) -> (String, String)) -> (String, String) {
...
}
With disallowing :
for default named parameters I think @munificent hinted that would make it easier for types on the right, but that would be a hugely breaking change. Would that make it easier for optional semicolons and other requests though? :) One can hope.
In any case, I know this is a contrived example, but Record seems unnecessarily wordy. Especially for a language that has void main()
rather than public static void main(String[] args)
.
I like @stereotype441's suggestion of adding an optional :
after metadata (could even do a ;
) for disambiguation, but if you really want a prefix I liked @lrhn's suggestion of #(int, int)
Another idea that came up in this morning's language team meeting is to disambiguate the two types of annotations based on whether there's whitespace between the identifier and the (
. So e.g. @foo(...)
would parse as an annotation that constructs an instance of foo
, whereas @foo (...)
would parse as the annotation @foo
followed by a record type.
Although technically breaking, this disambiguation rule would probably do the right thing in 99.9% of cases.
Summarizing discussion from this morning.
There are two broad approaches we could take: do something with the annotation syntax to disambiguate, or change the record type syntax.
On the first approach:
@name:
or @name(...):
.
@name(
always starts an annotation constructor call, and @name (
is always an annotated record type.
On the second approach:
Record<type1, {type2 name}>
) was not appealing
Record(type1, {type2 name})
could be done.
Function
case#(...)
just for the type level would work, somewhat consistent with Javascript proposals which use #{
and #[
#(...)
syntax uniformly for types, expressions, and patterns could work
There isn't broad agreement on the team as to a direction yet, we'll keep discussing this.
One possibility that we didn't discuss yet is using a different infix operator for the type syntax. An idea which probably doesn't work is to use ,
without parentheses:
int,int foo(int, int) => (3, 3)
is a function returning a pair of integers.foo(int, int x)
a function expecting a single parameter named x
of type int, int
, or a function expecting a parameter named int
of type dynamic
and parameter named x
of type int
?The ML family of languages uses infix *
.
int * int foo(int x)
is a function returning a pair of integers@deprecated int * int
is fine?int * int Function(int)
: is it a function returning a pair, or a pair with a function typed component?
*
bind more tightly()
for empty, but we still have the same ambiguity to work aroundint * {int x, int y}
{int x, int y}
A variant of the above would be to use parentheses with an infix * as a delimiter:
(int * int) foo(int x)
is a function returning a pair(*)
?)*
trick for the singleton tuple: (int*)
Other delimiters we could use (with or without parentheses):
int & int
or (int & int)
int ^ int
or (int ^ int)
I had hoped to stay out of this discussion, but feel I have to voice my dislike of using spaces (and lack of spaces) as disambiguation. On an objective level, we've never done anything like that before and e.g. actively chose a potential exponential (as I recall) parsing for disambiguating between null-aware bracket or a conditional. As also mentioned elsewhere here it will also invalidate assumptions for everything having to do with dart code (say for instance code generation and code formatting). On a completely subjective level I also just don't like it.
(And now that I'm already writing a comment:)
As for, e.g., Rd
I'd point out that I'd probably not get through a code review with code including such a weird abbreviation.
As for having two different syntaxes so that one can use one in normal cases and the other if the normal case is ambiguous seems complete bonkers to me. To me, at least, having a single consistent syntax would be much preferable.
Also I personally don't mind the verbosity of Record
. My mind would have an easier time remembering that, than some (combination of) punctuation(s) or abbreviation(s). That it becomes too long when nesting enough of them doesn't sound like a good argument to me --- with enough nesting it's going to be completely unreadable no matter what we do.
The Record(...)
syntax is definitely the safe and rational choice.
It's consistent with Function
. It's safe in all the right ways, least likely to set us up for problems in the future, because the grammar is well known. Accepts all record types without problems.
Using significant whitespace may fix this particular problem, but it leaves us wide open to the next conflict with an untagged parenthesis occurring after something else. It locks us down with regard to what other syntax we can add.
I'd be happy using the significant whitespace solution, but I don't think it's the most forward-looking choice.
(Edit: And we already had another one: on (e, s) {}
after a try. Raw parentheses are darn dangerous.)
The problem with Record(...)
is that it doesn't bring me joy. And I know how much writing Function
annoys me today. It's a constant annoyance, something abrasive that adds to my daily amount of wear and tear every single time I write it. I doesn't go away. It doesn't get better. It's just "here we go again". It's a cost of using Dart that I have to write Function
when I want a function type.
And I think that matters.
In that comparison, I'd prefer #(...)
for the type, and not for the values, even if it takes up precious grammatical space, because it will keep records a low syntactic overhead. I won't mind writing it. It's still searchable (#(
is unique). And if we can't use our syntax for something as central as the product type, what can we use it for?
I really like the colon suggestion from @stereotype441. I think allowing it (not making it required) wouldn't be a breaking change right? The point is there needs to be some delimiter between annotations and what they're annotating. Of all the solutions presented, it feels like the cleanest.
Also this applies to annotated parameters as well. int foo(@annotation (int, int) latLong)
has the same ambiguity.
Sometimes I feel like the parenthesis are extra unnecessary code, let's say, compared to Golang.
Example: int, int getOrigin() { return 0, 0; }
vs (int, int) getOrigin() { return (0, 0); }
.
However, it's not too much code and may make the code more readable.
In my opinion, having to write Record defeats the main advantage of records by requiring so much boilerplate and making the code not look good.
As another consideration. Right now the Records feature does not break user written classes named Record
because the user written declaration shadows the dart:core Record, and the record type does not need the name Record
in it. If you make it mandatory to write Record
as part of the type, it might make the disambiguation a little weird.
// User written
class Record {}
Record(int, int) f0(Record(int, int) value);
Record f1(Record value);
I'm assuming you would do the right thing here and resolve to the dart:core Record type for f0, but the user written class for f1, (they are disambiguated because of the parentheses, but it definitely makes it confusing to read).
We'd have to be absolutely certain that we can distinguish whether Record(int)
should be a type or a constructor invocation.
We generally can do that. A lot of Dart would break down if we couldn't distinguish whether a position should be a type or an expression, since identifiers can be both. So it would probably work out. Like for Function
, the following (
force-disambiguates it into a type in a type-position, but not in an expression position.
I can't think of any true ambiguity (like you say the following (
disambiguates like Function
) but there will be a lot during editing. Here are a few I can think of.
Record(int, int) f0(Record(int, int), value);
// ^ record parameter missing a name, or is this a function parameter named Record
// (function parameter as written)
void main() {
Record(int, String)
{ // ^ is this a local function (yes), but also could be
// an object instantiation / function call with a missing semicolon
// a variable declaration with a missing name & semicolon
// (Makes it harder to be rid of semicolons), though I guess this is an existing problem with blocks
}
}
Would using #(
for types, but not for expressions, open up the possibility for record type literals? If we use either #(
or Record
do we plan to allow for record type literals?
Would using
#(
for types, but not for expressions, open up the possibility for record type literals?
Yes, it would.
If we use either
#(
orRecord
do we plan to allow for record type literals?
I don't think we support Function(...)
for function type type literals. I believe it's because relying on a built-in identifier (and not a true reserved word) makes that weird. So we probably wouldn't support record type literals if we use Record(...)
. Honestly, I think type literals are often more trouble than they're worth. I'm not inclined to double down on them because making the type annotation syntax overlap the expression syntax just paints us into a corner in a lot of ways. (Like the fact that we couldn't use class names as tear-offs for the unnamed constructor.) So I'm not inclined to make record type type literals a priority. But if we picked a syntax like #(...)
which made it clearly unambiguous, we definitely could allow them.
I too would avoid allowing type literals for anything new. If it doesn't follow the existing grammar (identifier, identifier + type parameters), then I won't introduce new syntax to allow the type literal. It takes syntax away from much more worthwhile features (aka. any other feature).
We can define typedef typeof<T> = T;
today and allow typeof<any type>
as a type literal for any type. Use $
if you want it even shorter. Creating a type literal for a type is not a problem which needs a syntactic solution.
Also #2393, for the extreme position.
We discussed this in the language meeting today. While the alternatives on the table have their merits, we prefer the original syntax that's currently in the proposal.
In order to resolve the ambiguity, we will take @leafpetersen's original suggestion to make whitespace significant. That means:
// Metadata with argument list on function with no return type:
@annotation(a, b) function() {}
// Metadata with no arguments on function returning a record:
@annotation (a, b) function() {}
Note that we'll make any whitespace significant, not just newlines. Making newlines significant would work for function and field declarations where idiomatic formatted code puts annotations on the previous line. But it doesn't work for annotations on function parameters which are kept inline, as in:
function({
@deprecated (int, int) pair
}) { ... }
The idea is that we'll try this approach out while the feature is still behind an experiment flag. During the implementation, we'll get a better sense of whether making the parser whitespace-sensitive in this way feels brittle or regretful. And we can try out this behavior to see how it feels in practice. If it feels like a bad choice, we can revisit it. But if it works (and we think it will), then it lets us keep the syntax we like the most.
We have not figured out exactly how whitespace is sensitive here yet. Is it only for cases that are actually ambiguous? Or do we never treat a parenthesized thing after a metadata annotation as its argument list if there is space before it?
Here's an initial suggestion:
When parsing a metadata annotation, ignore any whitespace and try to parse it as an argument list as usual.
If that leads to a parse where the metadata annotation precedes a location where a type annotation is optional and has been omitted, then look for whitespace before the (
. If found, instead treat the argument list as a record type. Do this unconditionally, even if it turns out that the parenthsized construct would be a valid metadata annotation argument list but not a valid record type. That means this would become an error:
@annotation (1) function() {}
Thoughts?
Here's an initial suggestion: ...
@jensjoha
I have to wonder how that will impact the performance of the parser. Today the parser is very eager when it comes to invoking methods on the listener, so this will effectively require it to parse every piece of metadata twice (once to decide how to parse it and once to actually parse it). It might not be a huge hit, but my understanding that the performance of the parser is fairly important.
There was some discussion in the past about building a general mechanism to support recovery (basically capture the events that would have been sent to the listener and then either actually send them or completely discard them based on whether the attempted parse succeeded. It might be time to revisit some of those discussions.
We have not figured out exactly how whitespace is sensitive here yet. Is it only for cases that are actually ambiguous? Or do we never treat a parenthesized thing after a metadata annotation as its argument list if there is space before it?
The latter is my preference, and I think it shouldn't have any performance implications at all. If you see @<identifier><whitespace>
you're done with the metadata, otherwise whatever is there instead of the whitespace is an argument list. Does that fall down somewhere?
That would certainly resolve the parsing issues. There's no need for backtracking under that definition.
That sounds good to me too.
We need to decide where whitespace can go when there is a type argument.
We can choose any of:
@foo<int>(bar) |
@foo <int>(bar) |
@foo<int> (bar) |
@foo <int> (bar) |
|
---|---|---|---|---|
No internal space | ✔️ | ❌ | ❌ | ❌ |
No space before arguments | ✔️ | ✔️ | ❌ | ❌ |
No space after identifier | ✔️ | ❌ | ✔️ | ❌ |
No ambiguity if type arguments | ✔️ | ✔️ | ✔️ | ✔️ |
The remaining combinations would be inconsistent.
We can definitely say that any whitespace after identifier
ends the metadatum, that reduces it to "no space after identifier" and "no internal space", both valid options.
The grammar production for those could be:
No internal whitespace:
'@' <identifier> ( ~<WHITESPACE> <typeArguments>? ~<WHITESPACE> <argumentList>)?
No whitespace after identifier:
'@' <identifier> ( ~<WHITESPACE> <typeArguments>? <argumentList>)?
No whitespace before arguments:
'@' <identifier> ( <typeArguments>? ~<WHITESPACE> <argumentList>)?
No ambiguity if type arguments:
'@' <identifier> ( <typeArguments> <argumentList> | ~<WHITESPACE> <argumentList>)?
(I'm fairly sure I'm stretching the grammar by using negative lexical tokens in grammatical rules as negative lookahead, where in lexical rules they match an actual character other than the negated set.)
We can definitely say that any whitespace after
identifier
ends the metadatum, that reduces it to "no space after identifier" and "no internal space", both valid options.
I don't have extremely strong opinions here, but I feel like "no space after the identifier" is easy to explain, and should be easy to implement (you could in principle do it in the tokenizer, I think).
The one risk of doing "no space after identifier" vs "no internal space" is that if we ever want to allow explicit generic instantiation as metadata, @foo<typeargs>
and nothing more, we would introduce another ambiguity.
If we go with "no internal space" now, we don't have to change it later.
On the other hand, if we never allow @foo<typeargs>
by itself (which would be weird, since we're then annotating with either a function or a Type
object), making our current life easier is also valid.
We can change it later. The formatter will have ensured that no existing formatted code at that time will be affected by such a change.
So, "no space after identifier" SGTM.
I hate myself for even asking this, but what about:
@foo/*comment*/(a, b) thing;
Don't hate yourself. Hate society for bringing us to this point!
Having a comment isn't even completely unreasonable at that point. Consider a generic annotation, and
@Marker/*<int|String>*/(const bool.hasEnvironment("flooNum")
? const int.fromEnvironment("flooNum")
: const String.fromEnvironment("floo"));
Ok, a little speculative maybe, it's in a position where no reader benefits from the comment.
I'm very tempted to say that if the next source token (ignoring comments and whitespace) after the identifier does not start right where the identifier ends, it's not part of the annotation. So no comments either.
If we allow comments, we get into cases like:
@foo// Nyah, nyah
(int, int) get foo => 1;
and we'll have to decide whether the newline is part of the comment or counts as a separate whitespace. (Or forget to consider some similar case, and have implementations do different things.)
The KISS is (
/<
must be right after the identifier. That's unambiguous. Allowing anything there opens us up to unforeseen complications and special cases.
There are cases of:
@deprecated // deprecated
in the world today, but I don't see any cases of the /
being flushed against the identifier (thank you Dart formatter!).
That brings up another way to consider it: What would the formatter do? It does:
@C /*x*/ ()
@C /*x*/ <int>()
@C<int> /*x*/ ()
@C<int>() /*x*/
@C /*x
y*/
()
@C /*x
y */
<int>()
@C<int> /*x
y*/
()
@C<int>() /*x
y*/
@C // x
<int>()
@C<int> // x
()
@C<int>() // x
void main() {}
The formatter wants to put spaces between comments and other things. All existing formatted code will have the space, which means that existing code with comments between identifier and <
/(
will stop working anyway. Making the comment itself breaking won't change that.
(If there was any such existing code,)
That brings up another way to consider it: What would the formatter do?
I want a bumper sticker that says "What would the formatter do?" 😃
The KISS is
(
/<
must be right after the identifier. That's unambiguous. Allowing anything there opens us up to unforeseen complications and special cases.
Agreed. And it's consistent with the way the scanner treats comments elsewhere in the language. E.g. int/* */x
means int x
, not intx
. And i-/* */-
means i- -
(a syntax error), not i--
.
I wouldn't say that int/* */x
means int x
, just that comments are token separators the same way as whitespace. It means int
and x
are separate tokens, and i-/**/-
is three tokens instead of two.
For @id()
, the id
and (
are going to be separate tokens anyway, with or without intermediate token separators. Adding or removing whitespace or comments between id
and (
doesn't currently affect parsing at all.
The difference for the "whitespace sensitive parsing" is that we make the parsing depend not only on the sequence of tokens, but also on their physical location, the presence or absence of non-token source characters between them. That's something we don't do anywhere else in the current grammar (as far as I know).
The formatter is guaranteed to work, because all it does is to change the position of tokens, but it makes sure to not change the order or the tokenization (no making two identifiers touch).
With whitespace sensitive parsing, the formatter must also make sure to not put space between the id
and (
, or remove such space, which is a safe operation everywhere else.
It's nothing insurmountable, but it is a new thing, and a special case that must be handled specifically by new code that doesn't already exist.
I guess you could say that we introduce new tokes IDLT : IDENTIFIER '<'
and IDPAREN : IDENTIFIER '('
and make the grammar of metadatum be:
'@' (<identifier> | IDLT <typeList> '>' <arguments> | IDPAREN (<argumentList> ','?)? ')'
(and then everywhere we otherwise accept identifier '<'
/identifier '('
we should also accept IDLT
/IDPAREN
too.)
Well, except that it's not IDENTIFIER
, it's qualified
, so we need to tweak that too.
That brings up another way to consider it: What would the formatter do?
I want a bumper sticker that says "What would the formatter do?" 😃
"But, doctor, I am Pagliacci!"
The formatter wants to put spaces between comments and other things.
It does have some logic to not put spaces before /*
and after */
in certain contexts, mainly to try to make the old generic method comments look right. That logic is heuristic-based and pretty brittle.
It's nothing insurmountable, but it is a new thing, and a special case that must be handled specifically by new code that doesn't already exist.
It's not entirely new. In addition to not adhering identifiers together (which is obvious), the formatter also has to take care to insert a space between nested unary minus so that it doesn't turn - -2
into --2
.
The formatter is probably more aware of whitespace than any other tool. It makes sense that it is already doing things with it. The new code I was talking about was also in the parser, which should not have anything today which checks whether two tokens are adjacent. Maybe it's easy, but it will be new.
The following code generates what look like parser errors.
/fyi @jensjoha