[RFC] Use parenthesis instead of square brackets for new aggregates

Glacia commented 5 years ago

What were the reasons why square brackets were chosen? They don't look super ugly to me, but it make sense to not introduce different syntax out of blue.

mosteo commented 5 years ago

For sets/maps I would have expected curly braces before brackets. But I'm no mathematician. And we could finally say "but Ada has curly braces!" when C++ people complained ;-)

Joking aside, totally in favor of not introducing a new symbol ( "@", not looking at you) and messing with arrays if not unavoidable.

sttaft commented 5 years ago

The Ada Rapporteur Group (ARG) chose square brackets because they are often used for creating aggregate-like things in other languages, and by escaping the heavy overloading of parentheses, we are able to easily accommodate cases like empty aggregates, and single-element aggregates, without any strange incantations. For set constructors, this becomes particularly important, since there is no "key" analogous to the array index to use to produce an empty range or specify the index of a single element. As pointed out in another comment, curly braces were another possibility for sets, but we felt that distinguishing between sets and vectors or lists might be difficult in some cases, since from the interface point of view, there is a range of definable containers that represent various points on the spectrum between sets and lists (e.g. "bags"), and trying to specify which get to use {} and which use [] would not always be obvious. We also allowed arrays to use [] syntax as well, as they are very similar to vectors.

On Thu, Jul 4, 2019 at 10:34 AM Glacia notifications@github.com wrote:

What were the reasons why square brackets were chosen? They don't look super ugly to me, but it make sense to not introduce different syntax out of blue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AdaCore/ada-spark-rfcs/pull/21?email_source=notifications&email_token=AANZ4FMT4BJNIGFFYVLOIRLP5YC6XA5CNFSM4H46MV72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZHSL5A#issuecomment-508503540, or mute the thread https://github.com/notifications/unsubscribe-auth/AANZ4FMU2IFBQAQUJHUA2LDP5YC6XANCNFSM4H46MV7Q .

raph-amiard commented 5 years ago

The Ada Rapporteur Group (ARG) chose square brackets because they are often used for creating aggregate-like things in other languages, and by escaping the heavy overloading of parentheses, we are able to easily accommodate cases like empty aggregates, and single-element aggregates, without any strange incantations

FWIW, I agree with @QuentinOchem that this feels like an unneeded addition of syntax. The proposed syntax for null aggregates and one element aggregates looks very coherent with current Ada, and doesn't look like strange incantations to me.

I would not be against introducing square brackets in the language, but for me this would need to be done in a more principled way, eg. also considering being able to use them for indexing homogeneous structures instead of parentheses, like it is done in those other languages you refer to @sttaft.

sttaft commented 5 years ago

The Ada Rapporteur Group (ARG) chose square brackets because they are often used for creating aggregate-like things in other languages, and by escaping the heavy overloading of parentheses, we are able to easily accommodate cases like empty aggregates, and single-element aggregates, without any strange incantations

FWIW, I agree with @QuentinOchem that this feels like an unneeded addition of syntax. The proposed syntax for null aggregates and one element aggregates looks very coherent with current Ada, and doesn't look like strange incantations to me.

I can see "pros" and "cons," and the ARG debated this heavily for several years. We concluded that the square bracket approach was cleaner and ultimately easier to read and understand, and by allowing array aggregates to use this new syntax as well, solved a long-standing annoyance and learning hurdle for convenient use of array aggregates for zero- and one-element arrays.

I would not be against introducing square brackets in the language, but for me this would need to be done in a more principled way, eg. also considering being able to use them for indexing homogeneous structures instead of parentheses, like it is done in those other languages you refer to @sttaft.

I would also be happy to see [] used for indexing, but I believe that is a second step, and represents a bigger step away from the Ada view that indexing and calling a function can be usefully viewed as alternative ways of accomplishing the same thing. Now that we have user-defined indexing, there might be less need to try to allow a function call to look like indexing and vice versa. But in any case, trying to combine two relatively different issues such as aggregates and indexing is bound to make the discussion much more complicated and make consensus harder to reach. I do understand the notion of making bigger changes to justify smaller changes, but I have also seen that approach fail on a regular basis.

sttaft commented 5 years ago

One specific comment -- the singleton solution doesn't seem to work:

The proposal is to introduce a new notation, available for record, arrays and containers, qualifying after the type of the array:

X (array_of_integer’(1)); The above case resolves any type issue.

The qualified_expression syntax is already defined for parentheses, so this suggestion doesn't distinguish between a qualified, parenthesized expression and a single-element aggregate.

sttaft commented 5 years ago

One other specific comment -- as we have progressed toward newer Ada standards, there has been a steady effort to reduce the number of times fully qualified names have to be repeated. The prefix notation is friendly because once you declare an object, you can use its operations without making sure that the package where the object's type and operations are declared is directly visible. The new object renaming syntax eliminates the need for repeating the type name multiple times. Loop parameters in iterators generally do not require the type name, as they can be inferred from the initializing expression. But for good reasons, when you actually do have to use a type name, you are willing to give a more fully qualified name, especially if it is coming from, say, a generic instance. So it is relatively unlikely that the type name of interest is going to be something as concise as "R". What this means is that everywhere "R" appears in the examples, it is not unlikely that you will have a relatively long type name. In particular, having to use a full type name inside a null aggregate can become quite heavy weight. If you also have to use the type name outside to disambiguate the type of the aggregate (because the general rule is you never look "inside" an aggregate until you know its exact type), it becomes even heavier.

sttaft commented 5 years ago

For what it is worth, Python (which has emerged as a very popular teaching language at the high school and college level) uses "(...)" for tuples, "[...]" for lists, and "{...}" for sets and maps.

This relates to Ada 202X in that tuples and records are similar, and both would use "(...)". We are proposing to use "[...]" for everything that is a relatively "homogeneous" collection of data such as lists, sets, bags, maps, etc.

yannickmoy commented 5 years ago

@sttaft there are tuples in Ada 202X? Outside of a record type defining the corresponding type?

sttaft commented 5 years ago

@sttaft there are tuples in Ada 202X? Outside of a record type defining the corresponding type?

No tuples in Ada 202X. What I meant was that record types and tuple types are similar, in that they are a heterogeneous collection of individually named items of various types.

QuentinOchem commented 5 years ago

The proposal is to introduce a new notation, available for record, arrays and containers, qualifying after the type of the array:

X (array_of_integer’(1)); The above case resolves any type issue.

The qualified_expression syntax is already defined for parentheses, so this suggestion doesn't distinguish between a qualified, parenthesized expression and a single-element aggregate.

From a user perspective, this is equivalent to all situations where a qualification accepts a literal of the type - in this case a literal is an aggregate.

One other specific comment -- as we have progressed toward newer Ada standards, there has been a steady effort to reduce the number of times fully qualified names have to be repeated. The prefix notation is friendly because once you declare an object, you can use its operations without making sure that the package where the object's type and operations are declared is directly visible. The new object renaming syntax eliminates the need for repeating the type name multiple times. Loop parameters in iterators generally do not require the type name, as they can be inferred from the initializing expression. But for good reasons, when you actually do have to use a type name, you are willing to give a more fully qualified name, especially if it is coming from, say, a generic instance. So it is relatively unlikely that the type name of interest is going to be something as concise as "R". What this means is that everywhere "R" appears in the examples, it is not unlikely that you will have a relatively long type name. In particular, having to use a full type name inside a null aggregate can become quite heavy weight. If you also have to use the type name outside to disambiguate the type of the aggregate (because the general rule is you never look "inside" an aggregate until you know its exact type), it becomes even heavier.

I disagree with the long name expectation. There are many situations where the name of the type appear, such as allocation, object declaration, other cases of qualification & so. I don't think anyone would object to

X := new R; -- or whatever long name

as opposed to:

X := new; -- arguably though, type could be deduced from the context.

Let's also not forget that we're talking about solution for rare cases. How often do you have to write an empty aggregate? Likely less than an allocation.

There's also an argument made on the readability of the Ada language, and some redundancies are useful in this regards. This is certainly a trade-off to be chosen between too much and too little, to be looked at on a cases by case basis, as the examples you're mentioning and the one above shows.

To be clear on the last point of your comment, the proposal is to look inside of the aggregate for resolving ambiguities, in a way that is similar to what overloading does in the case of parameters. There is obviously an implementation cost mentioned in the drawback section.

For what it is worth, Python (which has emerged as a very popular teaching language at the high school and college level) uses "(...)" for tuples, "[...]" for lists, and "{...}" for sets and maps.

This relates to Ada 202X in that tuples and records are similar, and both would use "(...)". We are proposing to use "[...]" for everything that is a relatively "homogeneous" collection of data such as lists, sets, bags, maps, etc.

Yes, this is trading an homogeneous concept in Ada (the aggregate) for another one, which is one of the primary choices being debated here.

To be clear, the solution proposed in this RFC to the inconsistency problem is an attempt as being as conservative and close to the current language design as possible. An alternative approach could be to extend the use of [] to records aggregates, and deprecate the () notation altogether. There is still the relatively mild and rare issue of ambiguity between record and array aggregate which could be solved in a similar way, but this would address the same concerns.

sttaft commented 5 years ago

The proposal is to introduce a new notation, available for record, arrays and containers, qualifying after the type of the array: X (array_of_integer’(1)); The above case resolves any type issue.

The qualified_expression syntax is already defined for parentheses, so this suggestion doesn't distinguish between a qualified, parenthesized expression and a single-element aggregate.

From a user perspective, this is equivalent to all situations where a qualification accepts a literal of the type - in this case a literal is an aggregate.

Not sure I understand your point. What happens if we have an overloaded function that returns both a value of type R and a value of the element type of the container R? E.g.

function F return R;
function F return Element_Type;

R'(F) does not disambiguate. This is the fundamental challenge for singleton aggregates.

...

Let's also not forget that we're talking about solution for rare cases. How often do you have to write an empty aggregate? Likely less than an allocation.

Actually, in my experience, when building up the value of a container, creating an empty container is quite common -- just about as common as using "null" with pointer-based data structures. And if working mostly with containers, allocators are relatively rare. ...

-Tuck

Robert-Tice commented 5 years ago

Something that I would find confusing is this syntax:

X : My_Container := [1, 2, 3];

X (X'First) := 10;

where we initialize the container with the [ ] syntax and then index into it with ( ). Most people who come from other languages could find this confusing because homogenous collections in Python, C, C++, Java, Go, Rust, etc are described with [ ] for indexing and initialization. Because Ada has already distinguished itself by using ( ) as the indexing syntax, adding [ ] seems unnecessarily confusing for those coming to Ada. This syntax would be more consistent:

X : My_Container := (1, 2, 3);

X (X'First) := 10;

If we were to accept this premise, we would have to distinguish between homogenous and heterogenous containers somehow. In cases of ambiguity, I do like @QuentinOchem's solution because it feels like idiomatic Ada and doesn't introduce an inconsistent syntax.

An alternative would be to use the syntax that C, C++, Java, Go, Rust, etc use:

X : My_Record := {1, 2, 3};

X.A := 10;

This introduces a new syntax but doesn't suffer from the same issue as homogenous data structures where we use two types of brackets for one type.

sttaft commented 5 years ago

Something that I would find confusing is this syntax:
X : My_Container := [1, 2, 3];

X (X'First) := 10;
where we initialize the container with the [ ] syntax and then index into it with ( ). Most people who come from other languages could find this confusing because homogenous collections in Python, C, C++, Java, Go, Rust, etc are described with [ ] for indexing and initialization. Because Ada has already distinguished itself by using ( ) as the indexing syntax, adding [ ] seems unnecessarily confusing for those coming to Ada. ...

As indicated in my response to Raph, I am not unsupportive (double negative!) of allowing "X[...]" for indexing, but I think it should be a separate proposal. But I also don't see a necessary connection between aggregates and indexing, since they are pretty different beasts, and that is true at least syntactically in other languages as well (most of the languages you mention don't actually use the same symbol). Python uses "{...}" for creating maps and "[...]" for creating lists, but always "M[...]" for indexing them. Go, C, and C++ use "{...}" for initializers, but "X[...]" for indexing. Java uses "{...}" for array initializers, but calls on a constructor "C(...)" for initializing a non-array object, and "X[...]" for indexing. "Rust" uses "{...}" for struct aggregates, "[...]" for array aggregates and "X[...]" for indexing.

So it seems that there is no strong connection in these languages between the aggregate and the indexing syntax. Be that as it may, I do see an advantage in allowing "X[...]" notation for indexing, but I don't think the rationale is particularly tied to this proposal. The main rationale in my mind is that essentially all other languages uses "X[...]" for indexing, and so it would ease entry to Ada.

raph-amiard commented 5 years ago

I'm not strongly leaning one way or another. As I said live, I think both solutions are pretty good. However:

I do agree with Rob that using [] for array literals but then () for indexation will look weird. I see what you're saying Tuck about the fact that this notation is not particularly consistent in other languages either, but here we're mixing inconsistency with unfamiliarity (I don't know another language using () for indexation).

I do also agree with Quentin about the "more than one way to do it" problem. It really rubs me the wrong way that you'll now have two ways to express the same aggregate, which will be a pain that people will have to solve in coding standards and such, and seems really against the Ada philosophy.

I can see the other side of the argument: We're doing contortions in order to overcome the fact that we've been using parentheses for way too many things, and in my eye, this proposal, and the fact that it would be harder to implement than the square bracket one, proves it.

Which is why I propose "the perfect solution" :)

Let's allow square/curly brackets aggregates for everything, and deprecate parentheses aggregate in Ada 2020.

It has the obvious disadvantage of being backwards incompatible. But many benefits. See corresponding RFC here: https://github.com/AdaCore/ada-spark-rfcs/pull/24

QuentinOchem commented 5 years ago

Not sure I understand your point. What happens if we have an overloaded function that returns both a value of type R and a value of the element type of the container R? E.g.
function F return R;
function F return Element_Type;
R'(F) does not disambiguate.
This is the fundamental challenge for singleton aggregates.

Thanks for the extra details. So we're talking of a case where you can't even decide the type of the contents of the qualified expression because of overloading. This is even more of a corner case than the one I have in mind. In this case, the proposal includes a notation qualifying only aggregates (R'Record and R'container) which could be available then. One could consider renaming these R'record_aggregate to make that explicit, again on the previous grounds that these are for rare case of disambiguation.

Let's also not forget that we're talking about solution for rare cases. How often do you have to write an empty aggregate? Likely less than an allocation.

Actually, in my experience, when building up the value of a container, creating an empty container is quite common -- just about as common as using "null" with pointer-based data structures. And if working mostly with containers, allocators are relatively rare.

You and I have vastly different experience in the matter.

yannickmoy commented 5 years ago

My preference goes to Raph's proposal to use square/curly brackets everwhere, and deprecate the use of parentheses for aggregates. Note that it's not backwards incompatible, as we would keep it legal, even if deprecated.

Even better would be to do the same for array/container indexing at the same time, using here square brackets for both.

QuentinOchem commented 5 years ago

I agree with @raph-amiard and @yannickmoy and am withdrawing this RFC in favor of #24.

AdaCore / ada-spark-rfcs

[RFC] Use parenthesis instead of square brackets for new aggregates #21