dart-lang / language

Design of the Dart language
Other
2.67k stars 205 forks source link

Should null-aware subscripting use `?[` or `?.[` syntax? #376

Closed stereotype441 closed 4 years ago

stereotype441 commented 5 years ago

The draft proposal currently extends the grammar of selectors to allow null-aware subscripting using the syntax e1?.[e2], however we've had some e-mail discussions about possibly changing this to e1?[e2], which would be more intuitive but might be more difficult to parse unambiguously.

Which syntax do we want to go with?

bean5 commented 5 years ago

Would it be horrible to propose that linters should warn users that the following is ambiguous unless they clarify by typing the left side?

var set = { a ? [b] : c };  // Set literal
var map = { a?[b] : c}; // Map literal

I suppose using the left to disambiguate is possible, but probably not a great plan. It looks like good, safe code, when in reality you have an edge case where you have to specify type in a loosely typed language.

So this proposal seems dead on arrival, but I figured I'd mention it just in case. Dart was born in the age of IDEs and is gaining momentum in great part because of Flutter, which seems heavily tied to modern IDEs with tons of plugins, so perhaps it is a viable approach, although cumbersome and counter-intuitive when learning from the problems that languages had in the past. Even if we avoid the linter approach for null-aware subscripting, it is only a matter of time before we face higher-level questions that perhaps we start to lean on linter help and farther-reaching disambiguation techniques. Who knows.

bean5 commented 5 years ago

I don't know if any of you noticed, but I'm on the fence, which is why I've voted for both sides here.

Admittedly, I'd rather have it in some short form rather than talking about it into oblivion by inaction. Flutter seems to be progressing very quickly, but this feature is sitting. Every second counts. Every second gives other languages time to compete.

Ultimately, my intuition is that we should just pick ?.[. Not because it is not ambiguous or because I think it is good choice, but because it seems to contain more information and would be forwards compatible if we decide to change it in a future version. That'll buy us time to try it out. Starting with the other would mean not being able to automatically shift to whatever we ended up running with in the long run. We won't know until users start to file complaints. I don't foresee many complaints since I think the audience for this will be power users. Novices will have to learn it only as quickly as it becomes mainstream in everyday use.

That being said, I've had peers complain when I use the ternary operator because "it is an advanced syntax" therefore decreasing maintainability. It is only advanced because we don't expect the newbies to learn it and because we don't use it often enough. It is self-fulfilling prophecy I hope to avoid this time around with Dart. Let's build for the future, not the past.

munificent commented 5 years ago

Also, you mention regularity. Do you mean this in terms of language regularity or being a "regular language"

In the general English sense of "internally consistent", not in the formal language sense of "non-recursive grammar.

hpoul commented 5 years ago

after following the TC 39 optional chaining discussions, i've got a serious Déjà vu, but they summed their proposal up in a pretty concise faq: https://github.com/tc39/proposal-optional-chaining#faq i guess it's not the worst outcome for ECMAScript and dart optional chaining to look similar..

lrhn commented 5 years ago

Thank you for the TC39 link. It is very relevant to Dart because both languages have C/Java style syntaxes. I think the only thing in the FAQ that I disagree with is that I do want long short-circuiting. Dart already has a notion of "chain of applications" from cascades, and it also makes a good delimiter for short-circuiting null-aware operators.

Cat-sushi commented 5 years ago

i've got a serious Déjà vu, but they summed their proposal up in a pretty concise faq:

Yes, it's quite similar, and it isn't convincing any more than the article in Medum.com or pro-?.[ comments here.

As for \<language X>, it has different syntactical constraints than JavaScript because of \<some construct not supported by X or working differently in X>.

Can anyone fill the blanks of \<language X> and \<some construct not supported by X or working differently in X> above specifically to make it convincing?

munificent commented 5 years ago

<some construct not supported by X or working differently in X>

The answer here is usually "C-style conditional operator and map/object/set literals", I believe.

Cat-sushi commented 5 years ago

@munificent What is the \<language X>? How differently work "C-style conditional operator and map/object/set literals" in \<language X>?

Aren't there any languages who have all of C-style ternary operator e1?e2:e3, Set literal with braces {e1,e2,e3}, Map literal with braces and colon {e1:e2} and null-aware subscripting with ?[ or ?.[ other than Dart?

Cat-sushi commented 5 years ago

My understanding of Set literal in languages. Correct the mistakes.

Dart: {1, 2, 3} // does have ternary operator e1?e2:e3 Python: {1, 2, 3} // doesn't have ternary operator Swift: [1, 2, 3] // Array literal can be used as Set literal in context of Set) Kotlin: N/A // instantiated by constructor Rust: N/A // instantiated by constructor JavaScript: N/A // instantiated by constructor Go: N/A // doesn't have Set

Is this a issue specific to Dart?

munificent commented 5 years ago

Aren't there any languages who have all of C-style ternary operator e1?e2:e3, Set literal with braces {e1,e2,e3}, Map literal with braces and colon {e1:e2} and null-aware subscripting with ?[ or ?.[ other than Dart?

The only one I know of offhand that has similar syntax is JavaScript, and they are also going with ?.[.

Cat-sushi commented 5 years ago

My question was that Dart is the only language who have all of ternary operator like e1?e2:e3, Set literal like {e1} and Map literal like {e1:e2}, which cause the ambiguity?

The proposal of ?.[ for JavaScript is at stage 3, anyway.

morisk commented 5 years ago

JavaScript is a mess, people escape to other languages like Dart and Go.

Please be better then TC39.

munificent commented 5 years ago

My question was that Dart is the only language who have all of ternary operator like e1?e2:e3, Set literal like {e1} and Map literal like {e1:e2}, which cause the ambiguity?

It's the only one I know of offhand. Ruby and Python have similar literals, but no ternary. Java and C# have ternary but no similar literals.

JavaScript is a mess, people escape to other languages like Dart and Go.

While I sympathize with the general sentiment, every language has many useful things we can learn from it. And, whether we may like it or not, many users coming to Dart will know JavaScript, so we lower the amount of new things they have to learn if we follow JavaScript in places where it makes sense to do so.

morisk commented 5 years ago

many users coming to Dart will know JavaScript

Hopefully the majority of people will come from iOS, Android, Xamarin. They all use ?[

aelayeb commented 5 years ago

I prefer the '?[' notation (and also '?+' with other operators). I know this is a tough decision but to me a language is a tool. And what we look for, when using a tool, is efficiency. To achieve that a language need to be as concise, readable and predictable as possible and the '?[' is closer to that than '?.['.

To summarize: the only "bad" point of using '?[' is whitespace awareness ? It just doesn't allow us to use it with a whitespace right ? To me it's by far better than having to add an unnatural dot between. I don't see the big deal in that.

In language, dots remind us of a method call. Our brain is used to that. Breaking with this habit and having to type an extra special character is a bit disappointing.

If there is no other limitation beside the whitespace sensitive issue (which is a VERY small one to me), I'd love you consider this option.

Edit: if other null-aware operators can't be implemented with the same notation for technical reasons I'd rather keep things consistent. So either '?[' and '?+' OR '?.[' and '?.+' but I'd be curious on what are the limitations for the operators.

munificent commented 5 years ago

To summarize: the only "bad" point of using '?[' is whitespace awareness ?

I think we on the language team also slightly lean towards preferring the look of ?.[, especially when it appears in a method chain (which is likely to happen for things like traversing JSON). In something like:

json?["some"]?["property"]?["chain"] ?? defaultValue;

I think it's easy to accidentally read that as conditional operators or if-null (??) operators mixed in with list literals. The ?[ doesn't look like a single token to my eye, especially given that there are cases where those two characters appear near each other today and are not a single token, as in: condition ? [someList] : another.

With:

json?.["some"]?.["property"]?.["chain"] ?? defaultValue;

To me, those . help it look more like a single method chain.

To achieve that a language need to be as concise, readable and predictable as possible and the '?[' is closer to that than '?.['.

True, but part of predictability is ensuring that aspects that users don't expect to be meaningful are not meaningful. I don't know if many users would predict that ? [ means one thing while ?[ means something completely different. Spaces are meaningful in some cases, like - - versus --, both those are rare and have decades of history.

In language, dots remind us of a method call.

Index operators are method calls. This is very important for users to understand because it affects short-circuiting. In Dart with NNBD, if the receiver of a null-aware method call or index operator evaluates to null, then the rest of the method chain gets short-circuited and skipped. Say you write:

foo?.a().b().c();

If foo is null, then Dart will skip not just a() but b() and c() too. This is equally true of:

foo?.[0].a().b().c();

If foo is null, the null-aware index operator is skipped, as is the rest of the method chain. So it's really important for users to be able to quickly identify what "the rest of the method chain" is when they look at some code. Using ?.[ helps reinforce that this is a method call, and one where this short-circuiting behavior is involved.

So either '?[' and '?+' OR '?.[' and '?.+' but I'd be curious on what are the limitations for the operators.

If we end up doing other null-aware operators, I believe we will likely use a dot, so a?.+(b), etc. Otherwise, we are opening up even more ambiguity problems. Also, one useful reason to support method call syntax for operators like this is because it could let you opt into the same null-aware short-circuiting behavior (which doesn't apply to infix operators by default). In that case, it would be useful to support forms like a.+(b). At that point, the . is mandatory because a +(b) is already valid syntax with established, non-short-circuiting behavior.

aelayeb commented 5 years ago

This makes sense, especially the chain part. Thank you for taking time to explain.

Whatever you choose I'll be satisfied. I was just afraid you were thinking more from a language maker point of vue than from a user perspective.

Cat-sushi commented 5 years ago

I've understood the importance of method call syntax to short-circuit null aware symbol ? for arithmetic operators with whom the order of operations is not always left to right. Now, The method call syntax of subscripting, if necessary, should be a.[](1), but I think it is not important, because chains of subscripting operators are always evaluated from left to right. On the other hand, as a normal syntax of subscripting operation, a?[1] seems more straight forward than a?.[1], as I mentioned. Likewise, both of a?.+(b) as a method call syntax and a ?+ b as an operation syntax should be acceptable.

I don't know if many users would predict that ? [ means one thing while ?[ means something completely different.

I don't think so, because ? [ is necessarily followed by ] :. And the widely used formatter always put space just after ? of ternary operator, as I mentioned.

Index operators are method calls.

It can't be a reason because foo[0][1][2][3][4] is already a popular method chain syntax sugar without dots, as I mentioned.

This is very important for users to understand because it affects short-circuiting. In Dart with NNBD, if the receiver of a null-aware method call or index operator evaluates to null, then the rest of the method chain gets short-circuited and skipped.

foo?[0]?[1][2][3]?[4] is OK for me, where the members of foo[] like foo[0][1] and the members of foo[][] like foo[0][1][2] have List<some non-nullable type> types. Having said that, the problem of preference could not be solved without popularity voting.

On the other hand, the problem of mental model seems quite fixed game, described below.

Q: Fill the blanks of XXX, YYY and ZZZ. (Select the correct answer 1 or 2)

a is non-nullable a is nullable
a.b a?.b
a..a a?..b
a.b() a?.b()
a..b() a?..b()
[...a] [?...a]
a[b] aXXX[b]
a(b) aYYY[b]
a + b a ZZZ+ b
  1. XXX: ?, YYY: ?, ZZZ: ?
  2. XXX: ?., YYY: ?., ZZZ: ?.
Cat-sushi commented 5 years ago

To avoid repeated discussions, can anybody hopefully in Google summarize the above discussions?

Whatever the dart team chooses with convincing reasoning, I also will be satisfied.

lrhn commented 5 years ago

I think Bob (@munificent) is doing a good work summarizing the trade-offs, which includes both existing syntax, potential future future syntax, visual appearance, etc., for example in https://github.com/dart-lang/language/issues/376#issuecomment-534793712.

Code like x?[x]?[y]: 42 is very hard to read. I know that if we go with ?[ always parsed a single token, the meaning is unambiguous (and I'll have to look further back for the ?, or if this is inside {...}, it's a map entry), and the formatter will help me by inserting spaces in reasonable places, but it is still not readable. A single ? simply has too much history as the conditional operator.

If we removed the ?/: operator entirely, say by using if (test) expr else expr as an expression, like we already kind-of do in collection literals, then the ? would be free, and I'd be less worried about the usability of ?[. I don't think that's realistic at the current time, though.

Cat-sushi commented 5 years ago

I had already read all the comments here, but I don't feel them convincing. Even if the system simply makes <List<int>>{a?[1]:[2]} intending Set literal an error, do the system have to look back?

lrhn commented 5 years ago

I don't care (much) about the complexity for the parser, as long as the grammar is not ambiguous. I do care about the readability to users.

The compiler is always right. What it does is what the program means (because or compilers obviously have no bugs :smile:). The trick with syntax is not to make it convenient to compilers, as long as it's unambiguous, the compiler will do its job. We can make that job more or less expensive, but it's rarely the bottleneck ... as long as your type system is not Turing complete, or something.

The real requirement of good grammar is that users who read or write code must understand it the same way as the computer. Anything that is hard to read for users, no matter how unambiguous it technically is, is a usability problem. Having to look back too far is a problem for people, not compilers. It makes code harder to read.

So, good syntax means that users read the code the same way as the compiler. I currently think that ?[ is too hard to read (aka. too easy to misread) for it to be good syntax. That's mainly because a stand-alone ? already means something that is itself semi-hard to read, and the ? in ?[ does look stand-alone. I think ?.[ is easier to recognize as something distinct from the ? in ?/:.

Cat-sushi commented 5 years ago

Anything that is hard to read for users, no matter how unambiguous it technically is, is a usability problem.

I said that it is readable enough with the formatter for me. But, I understand it is your strong belief that {a?[1]:[2]} is ambiguous. I'm convinced, now.

eernstg commented 5 years ago

One more thing to think about when it comes to syntax design: Each new syntactic form can be an asset to us as developers, because we can now use a more expressive language and write more readable/powerful/concise programs; but it is also an expense in terms of future language enhancements, because more syntactic forms means more sources of ambiguity. So we pay every time we add something that allows for syntactic forms that we might want to give some new meaning in a future extension: That future extension must then have a different syntactic form.

jodinathan commented 5 years ago

Even thought your claims just made me convinced that ?.[ is pretty much ok and lets move forward, I wonder, just for curiosity: C# has the ternary ? condition and also non-nullable access through ?[, what does an experienced C# programmer has to say about this ambiguity in its everyday use of the language?

Cat-sushi commented 5 years ago

@eernstg I understand that languages are not democracy, and I'm always ready to be satisfied with any decisions by the dart team. But, I always seek good reasoning for controversial decisions to keep loving the language.

Now, I almost believe that ? is already occupied by null-aware somethings with exception of ternary operator. So, I can't easily imagine a occurence of combination of ? and [ in a future enhance for whom the token of ?[, which is just lexical but not syntactical, would be a obstacle.

Could you illustrate some hypothetical examples, if you can?

munificent commented 5 years ago

C# has the ternary ? condition and also non-nullable access through ?[, what does an experienced C# programmer has to say about this ambiguity in its everyday use of the language?

C# doesn't have map and set literals using {}, which avoids the ambiguity we have in Dart.

mindplay-dk commented 5 years ago

I currently think that ?[ is too hard to read (aka. too easy to misread) for it to be good syntax. That's mainly because a stand-alone ? already means something that is itself semi-hard to read, and the ? in ?[ does look stand-alone. I think ?.[ is easier to recognize as something distinct from the ? in ?/:.

I find it's still confusing for me to look at - the . makes it look like some sort of property resolution is about to happen.

Since we have to disambiguate, how about a ?? as the operator? This is similar to dart's ??= operator, which similarly short-circuits the assignment depending on the presence of a null operand.

While this deviates from JS, perhaps it's visually/semantically more coherent with Dart's other null-dependent operator?

(or does this create another ambiguity I haven't thought of??)

jodinathan commented 4 years ago

C# has the ternary ? condition and also non-nullable access through ?[, what does an experienced C# programmer has to say about this ambiguity in its everyday use of the language?

C# doesn't have map and set literals using {}, which avoids the ambiguity we have in Dart.

I had to work with C# for a few months and, for so much time working with JS and dart, at a moment I forgot that C# didn't have map literals... conclusion: I've got myself into this ambiguity while trying to create some map and ternary condition. That can, in fact, be annoying to figure it out.

I've switched to team ]?.

altermark commented 4 years ago

If it were for me to decide, I would approach the problem from another angle.

Dart has ternary operator ?: and recently gained collection if, that is kind of similar but not exactly. In my opinion having ergonomic non-/nullable types in a language is much more important than ternary operator that causes parsing ambiguities.

My proposal is: Let's deprecate both ternary operator and collection if and replace it with conditional expression if <condition> then <value> else <alternative> (or python style <value> if <condition> else <alternative>), that can be used both in expressions and collections.

The language would be much cleaner IMHO.

bean5 commented 4 years ago

That would be more legible and expressive, but would be like ruby (readable if you want it to be).

munificent commented 4 years ago

Since we have to disambiguate, how about a ?? as the operator? This is similar to dart's ??= operator, which similarly short-circuits the assignment depending on the presence of a null operand.

While this deviates from JS, perhaps it's visually/semantically more coherent with Dart's other null-dependent operator?

(or does this create another ambiguity I haven't thought of??)

Yes:

var list = [1];
print(list??[0]);

Does this print [1] or 1? :)

lrhn commented 4 years ago

I would even go for list.[] and a.+, without the operator keyword, for tearing off the operator function. Because operator is allowed as a member name a.operator + b already has a meaning, but a.+(b) does not.

Now, if we also allowed parentheses-less application, f a instead of f(a), then a?.+ b would be the same as a?.+(b). That's probably going too far, so we may have to special-case a?.op b to works as a null-aware binary operator application. Doesn't work well for unary-, sadly, because again a?.unary is already a valid member access.

eernstg commented 4 years ago

Here's a more concise but still (hopefully) readable version of the above

That proposal would probably be more well-placed in #216, or in a new issue proposing support for tear-offs of "everything".

lhk commented 4 years ago

Do I understand it correctly, this discussion is only about removing ambiguity arising from the existence of the ternary operator? If that operator didn't exist, there would be no issue with ?[ and then I guess that no-one would prefer ?.[over ?[.

The two desired syntaxes are mutually exclusive and so we are trying to come up with the second best to ?[?

In that case, I would argue that null checks are going to be used much more often than the ternary operator. So if something is compromised, it should be the ternary operator syntax. I very much like the proposal of @altermark: if <condition> then <value> else <alternative>.

If that doesn't align with the C-Style of the rest of the language I'm sure that we can find other versions of the ternary operator, like: ?(<condition>){<expression>}{<expression>}

A more radical suggestion (and what I would prefer personally) would be to convert if-statements to expressions (let them return a value) and just deprecate the ternary operator entirely.

jodinathan commented 4 years ago

I also like the idea of deprecating the ternary operator in favor of if-expressions. I guess it wouldn't be very hard to create an executable that reads dart files of a project and turn ternary operators into if-expressions.

lrhn commented 4 years ago

This is a tangent (consider opening a new request for if-expressions). The main issues with having if (e1) e2 else e3 as a conditional expression is ambiguity with the conditional statement or collection element, either for parsing or when read by humans. And maybe an issue of with users expecting things to be different from what they are. And that it requires an else part.

Writing if (test) print("hello world") else print("not"); is one semicolon away from if (test) print("hello world"); else print("not"); The former is an expression statement with an conditional expression, the latter is a conditional statement.

You can change the latter to if (test) { var greet = "hello world"; print(greet); } else print("not");, but not the former. Would you expect to be able to? (One of the most common confusion point around collection literals is that users want to put bracers around conditional elements).

In a collection literal [if (test) e1 else e2] would need to be disambiguated because it can be either a conditional element or a conditional expression (we can always safely make it a conditional element, we just have to say that we do so, so the parser knows how to parse it).

I believe it to be possible to change e1 ? e2 : e3 to if (e1) e2 else e3 and still be able to parse it to mean the same thing.

Cat-sushi commented 4 years ago

In my opinion having ergonomic non-/nullable types in a language is much more important than ternary operator that causes parsing ambiguities.

IMHO, I agree with @altermark. If it is a consensus, should the ambiguity for humans be even negligible?

lhk commented 4 years ago

@lrhn , Thank you for the explanation, I see the complications.

lhk commented 4 years ago

It seems to me as if nnbd is the next big thing after the support for extension methods. I appreciate that this is a hugely non-trivial feature to implement, but as far as I can tell, it is being worked on very heavily and not that far away into the future.

And this operator is quite an integral part of nnbd. So, is there some internal consensus on how it should look? Does the team have a preference? Is there some status quo, which would be implemented, given no further input by the community?

Cat-sushi commented 4 years ago

A Summary of Discussions, so far (+ alpha)

leafpetersen commented 4 years ago

So, is there some internal consensus on how it should look? Does the team have a preference? Is there some status quo, which would be implemented, given no further input by the community?

I think the description of our thinking here still pretty much sums up the internal consensus. There's been a lot of good discussion here, and it's great to get a sense of where people in the community fall, but I don't really see anything that changes the balance on the decision.

Cat-sushi commented 4 years ago

@leafpetersen I think the article is the problem which is misleading with unconvincing reasons P2, P4, P7 and P9 I showed here.

lhk commented 4 years ago

I think @Cat-sushi is reading my mind :P Indeed, I read the article, felt entirely unconvinced and came here to see where the discussion stands.

Personally, I find the ?.[ syntax highly unelegant. As far as I understand it's a necessary evil, given the legacy of existing design decisions.

After reading the comment of @leafpetersen it sounds like this decision is pretty much made?

Out of curiosity: Has deprecating or restyling the ternary operator been on the table at some point, and if yes, why was it rejected? If I understand the grammar concerns correctly, converting condition ? a: b to something like condition | a:b would also solve the problem. And actually, I think this will even increase overall code readability. I always found the ternary operator kind of clunky to use. And with NNBD, question marks will be all over the place. Ternary operators will be even less readable then. If they have their own syntax, they would be easier to parse (for human readers). I guess this is not possible since it would break existing code.

Cat-sushi commented 4 years ago

With respect to conditional expression, NNBD could be a good chance to break it with dartfix which fixes it, I think.

Cat-sushi commented 4 years ago

@tatumizer We are talking about the spec of the language but not each piece of code.

leafpetersen commented 4 years ago

@leafpetersen I think the article is the problem which is misleading with unconvincing reasons P2, P4, P7 and P9 I showed here.

@Cat-sushi To a first approximation I think we agree on the pros and the cons. We just don't agree on their relative weight, sorry! :)

Out of curiosity: Has deprecating or restyling the ternary operator been on the table at some point, and if yes, why was it rejected?

@lhk Not it! :) I'd be happy to have different syntax for conditional expressions, so long as someone else is volunteering to manage the migration of millions of lines of code.

I think there's a way to disambiguate this without introducing new syntax.

@tatumizer There are lots of ways to disambiguate this - that's not the concern. The concern is users writing code that they think means X, but which the compiler interprets as Y.

Cat-sushi commented 4 years ago

@leafpetersen The comment of mine doesn't describe priority at all, and I understand that the Dart team puts much weight on P1 and P3.

lhk commented 4 years ago

@tatumizer , I think the problem with your disambiguation is that dart has both set and map literals. The compiler will not raise an error if the user forgets to add the braces, it will simply parse a different type:

{"a": 1}; // map literal
{"a"} ; // set literal
{ (a ? [b]) : c }; // map literal
{ a ? [b] : c }; // set literal

If I understand @leafpetersen correctly, this is the kind of ambiguity (not for compiler but for human readers) he is talking about. Both the code with and without braces will be accepted by the compiler. But it means totally different things.

lhk commented 4 years ago

@tatumizer, @leafpetersen, For what it's worth, I would still very much prefer the approach taken by @tatumizer. Dart is a strongly typed language, so I don't think that this renegade set literal would get very far before causing a compiler error after all :) This kind of mistake should not propagate very far and be easy to find.

As soon as your sets or maps have more than one entry, it should become an immediate compiler error in any case. Well, at least as long as the other entries are not based on the NNBD syntax and can be unambigously parsed.