GNAT-specific Iterable aspect should be considered for standardization

Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype

28 stars 1 forks source link

GNAT-specific Iterable aspect should be considered for standardization #21

Open sttaft opened 2 years ago

sttaft commented 2 years ago

GNAT supports a lighter-weight way to create an "iterable" container, using the Iterable aspect, as defined below. We should consider standardizing this aspect for Ada itself.

3.23 Aspect Iterable

This aspect provides a light-weight mechanism for loops and quantified expressions over container types, without the overhead imposed by the tampering checks of standard Ada 2012 iterators. The value of the aspect is an aggregate with six named components, of which the last three are optional: First, Next, Has_Element, Element, Last, and Previous. When only the first three components are specified, only the for .. in form of iteration over cursors is available. When Element is specified, both this form and the for .. of form of iteration over elements are available. If the last two components are specified, reverse iterations over the container can be specified (analogous to what can be done over predefined containers that support the Reverse_Iterator interface). The following is a typical example of use:

type List is private with
    Iterable => (First       => First_Cursor,
                 Next        => Advance,
                 Has_Element => Cursor_Has_Element,
                [Element     => Get_Element]);

The value denoted by First must denote a primitive operation of the container type that returns a Cursor, which must a be a type declared in the container package or visible from it. For example:

function First_Cursor (Cont : Container) return Cursor;

The value of Next is a primitive operation of the container type that takes both a container and a cursor and yields a cursor. For example:

function Advance (Cont : Container; Position : Cursor) return Cursor;

The value of Has_Element is a primitive operation of the container type that takes both a container and a cursor and yields a boolean. For example:

function Cursor_Has_Element (Cont : Container; Position : Cursor) return Boolean;

The value of Element is a primitive operation of the container type that takes both a container and a cursor and yields an Element_Type, which must be a type declared in the container package or visible from it. For example:

function Get_Element (Cont : Container; Position : Cursor) return Element_Type;

This aspect is used in the GNAT-defined formal container packages.

swbaird commented 2 years ago

It would be helpful to provide more motivation for this feature. What problem is it trying to solve? How does this compare to the solution that we already have for avoiding tampering checks during iteration (i.e., AI12-0111 and the nested "Stable" packages)? I think there are good arguments for this feature (similar to the arguments for the bounded containers), but these arguments need to be stated.

sttaft commented 2 years ago

The main advantage seems to be lowering the entry barrier to using the feature, and avoiding certain features that can give certain embedded-system programmers heartburn, such as tagged types, interfaces, class-wide types, etc. The original version of this feature was designed before we had fully embraced the power of aspects, and resulted in use of relatively complex OOP features. For newer features based on aspects, we have generally avoided the need for sophisticated OOP features to provide a given capability.

Raphaël Amiard has in the past expressed some concerns with standardizing this feature, so it would be interesting to understand his concerns.

mosteo commented 2 years ago

As a plain user what I can say is that my experience providing (standard) iterators feels heavyweight and complex (I have to check the ARM/examples every time). In short, a pain :-) so I skip it unless really motivated.

Richard-Wai commented 2 years ago

I think in practical terms needing to create your own iteratable container type is (or probably should be) rare. That being said I think there is clearly a lot of boilerplate when doing it the standard way, which can have negative impacts on readability.

It seems to me that, had aspects been more mature when this feature was first introduced, there is a good chance we would have converged on something closer to this proposal than what we ended up with.

I appreciate the potential for avoiding the need for dynamic dispatch and class-wide types, making this more available in constrained environments.

At a strategic level I'd usually consider just how "needed" this feature is compared to the tactical reality of realizing it. I'm not super convinced it is a "needed" change, given how generally rare the need is to "roll your own" container types. However, since this seems to already be a thing in GNAT, that makes the cost of implementation relatively low.

Having two ways of doing the same thing can be a bit awkward, but that's nothing new for most mature languages, provided it is well justified. I think in this case it comes with significant benefits that align with Ada's design goals.

briot commented 2 years ago

At a strategic level I'd usually consider just how "needed" this feature is compared to the tactical reality of realizing it. I'm not super convinced it is a "needed" change, given how generally rare the need is to "roll your own" container types. However, since this seems to already be a thing in GNAT, that makes the cost of implementation relatively low.

In my experience, it is on the contrary pretty often that we want to have custom iteration schemes. Perhaps this is because I also program in Python and Rust where iterators are first class objects, though.

Some examples where iterators have been useful:

GNATCOLL.SQL to iterate over the results of a query. There is no actual container associated, we just iterate over the results. It is nice to be able to use "for Row of Query(...)"
We also use them to iterate over data sent on sockets (we parse the next item, return it, and so on)
I was experimenting with traits containers a few years ago. It was nice to be able to add light-weight iterators on top of these new containers (though, because this library was heavily based on generics, it wasn't as useful as it could be because there is no way when one describes the formal parameters of a generic to say "this one supports iteration").
A quick check on our code base, at Deep Blue Capital, shows 52 cases where we declare the Iterable aspect. A large number of those seem to be iterator over multiple underlying tables (what python would call zip() I guess)

In practice, I would argue that even the GNAT iterator aspect could be improved upon. It is much more usable than the AARM iterators (I made an AdaCore gem on the subject quite a few years ago, and the few times where I wanted to implement full iterators I had to refer to it, because I find the AARM hard to follow here).

But at least for us it seems that our iterators are of the sort that would be best described as "consumers":

while Has_Element (Iterator) loop D := Consume (Iterator); -- return current data and move on to the next data ... end loop;

The Next is implicit, simply because there is no way to go backward (think of data coming from a database or a socket, in my examples above). It is possible to have that using the Iterable aspect, but it requires caching the current item, which might be expensive sometimes.

Emmanuel

Richard-Wai commented 2 years ago

@briot I'm not sure I understand using an iterator in a consumption pattern. It seems like the idea of iteration is taken a bit too far. I certainly wouldn't consider a loop that consumes objects to be "iterating", that to me is a queue, and one does not iterate over a queue. This might be an argument in semantics, but nonetheless I tie the idea of iteration to the idea of a container - some kind of data structure that maintains a collection of objects that you want to be able to iterate over. Ergo iteration is idempotent wrt the container itself. Iteration should not change the container. If we agree on that, to me your final example is perplexing because a for statement seems fit for the intention of the program, but with far less complexity. If the argument is that we might want an iterator to self-advance, that seems to be the purpose of for statements.

I do concede that I've used a fair number of iterator types myself in Ada, and maybe my position that they are rare is over-stated.

I'd be curious to know more about your thoughts on how this proposal could be improved.

sttaft commented 2 years ago

By "consumption" I don't believe @briot means you are modifying the underlying structure. Rather, you are modifying the iterator by the "get_next" operation, rather than separating the Element and Next operations ("Get_Element" and "Advance" in the example).

briot commented 2 years ago

On 2022-08-23 15:40, sttaft wrote:

By "consumption" I don't believe @briot https://github.com/briot means you are modifying the underlying structure. Rather, you are modifying the /iterator/ by the "get_next" operation, rather than separating the Element and Next operations ("Get_Element" and "Advance" in the example).

Hi Tucker,

That it correct. In my mind, a lot of iteration occurs on what could be called "virtual containers", i.e something that behaves like a container, but actually gets its data from other sources. Think of database queries, or sockets for instance, though it could also be calls to the operating system (reading the files from a directory for instance), or wrappers for third-party libraries.

In practice, these virtual containers provide the equivalent of a read() function, and that's it. If you are lucky, you can check whether there is some more to read, but this isn't always the case.

In most cases, iterating should not change the container (so that we can possibly have multiple concurrent iterators, for instance). In the case of these "virtual containers" though, the iteration can be considered as changing the container, because you cannot go back to previously read data. This is why I prefer the term "consume" here, rather than iterate. But in user code, the syntax would be the same, namely the "for...of" loop.

This syntax, of course, is just syntactic sugar on top of the (easy to write) explicit code. But it becomes interesting if these iterators/consumers are used in more places and maybe become full blown elements, with a number of standard operators on them. This is what Rust does, and for instance all iterators provide map(), filter(), for_each() and a large number of operations.

regards

Emmanuel

Message ID: @.***

clairedross commented 2 years ago

On Mon, Aug 22, 2022 at 7:21 PM sttaft @.***> wrote:

The main advantage seems to be lowering the entry barrier to using the feature, and avoiding certain features that can give certain embedded-system programmers heartburn, such as tagged types, interfaces, class-wide types, etc. The original version of this feature was designed before we had fully embraced the power of aspects, and resulted in use of relatively complex OOP features. For newer features based on aspects, we have generally avoided the need for sophisticated OOP features to provide a given capability.

Message ID: @.*** com>

Historically, the Iterable aspect was designed for SPARK. We wanted something simple that would allow SPARK users to benefit from custom iteration and (more importantly) quantification. It has been used inside formal containers for years. An important feature of this schema is that it does not do potentially complexe things under the hood to ensure safety (like tampering checks). Users know exactly what is happening. Basically, iteration is nothing more than syntactic sugar on top of the natural while loop:

declare Cu : Cursor := First (Container); begin while Has_Element (Container, Cursor) loop [declare E : constant Element_Type := Element (Container, Cursor); begin] ... [end;] Cursor := Next (Container, Cursor); end loop; end;

Note that modifying Container in such a loop is not a safety hazard. The semantic is perfectly clear. The only potential surprise for the user is the potential for such for loops to not terminate if Container is modified in the loop. We could consider emitting a compilation warning in that case when it is easy to detect. -- Claire Dross Senior Software Engineer, AdaCore

briot commented 2 years ago

Historically, the Iterable aspect was designed for SPARK. We wanted something simple that would allow SPARK users to benefit from custom

One difference that is worth highlighting between the AARM iterators and the ones defined via the GNAT aspect is that for the latter all operations receive the container as a parameter. This is definitely mandatory for SPARK, if I remember right, but also actually makes it easier to write light-weight iterators (no need for pointers to the container, for one).

The only potential surprise for the user is the potential for such for loops to not terminate if Container is modified in the loop. We could consider emitting a compilation warning in that case when it is easy to detect.

That would require special handling in the compiler though, right ? I must admit I do not see how to have a more general approach.

A runtime error is doable (via a generation marker in the container and in the iterator), though of course adds a minor performance cost.

Emmanuel

Message ID: @.***>

sttaft commented 2 years ago

In ParaSail, the primary iterator is a destructive iteration over a potentially-ordered set, by removing elements, either first, last, or "random". Non-destructive iterators are created using syntactic sugar, by copying the set initially for a set iteration, by extracting the set of keys from a map, or by constructing the set of index values of a vector or array.

The nice thing about destructive iterators is that they never need pointers into the middle of a tree structure to remember where they are in the iteration, and they essentially ignore changes to the original container, since the iteration is happening on a private copy. The iteration is fully determined by what remains in the (copied) set.

The performance challenge is to make the copying and destructive iteration cheap enough that it is competitive with the more complex and in some case less safe non-destructive iteration.

clairedross commented 2 years ago

On Thu, Sep 1, 2022 at 4:01 PM sttaft @.***> wrote:

In ParaSail, the primary iterator is a destructive iteration over a potentially-ordered set, by removing elements, either first, last, or "random". Non-destructive iterators are created using syntactic sugar, by copying the set initially for a set iteration, by extracting the set of keys from a map, or by constructing the set of index values of a vector or array.

So this only works for iteration over values (for of) not over cursors (for in) right? It does seem notably more complex as you need an unbounded data-structure to store the values.

-- Claire Dross Senior Software Engineer, AdaCore

sttaft commented 2 years ago

On Thu, Sep 1, 2022 at 11:18 AM Claire Dross @.***> wrote:

On Thu, Sep 1, 2022 at 4:01 PM sttaft @.***> wrote:

In ParaSail, the primary iterator is a destructive iteration over a potentially-ordered set, by removing elements, either first, last, or "random". Non-destructive iterators are created using syntactic sugar, by copying the set initially for a set iteration, by extracting the set of keys from a map, or by constructing the set of index values of a vector or array.

So this only works for iteration over values (for of) not over cursors (for in) right? It does seem notably more complex as you need an unbounded data-structure to store the values.

I am not recommending this particularly for Ada, more illustrating another way of doing things. A set of (contiguous) index values in ParaSail is actually represented with just a low bound and a high bound, so there is nothing unbounded about it, and taking an element shrinks the set by increasing the low bound (for a forward iterator), decreasing the high bound (for a reverse iterator), or picking one of the two bands pseudo-randomly (for an unordered iterator). Another set representation is a set of ranges, which might be "open" or "closed" on either end of each range. There are also hashed sets, tree sets, and bit sets, some of which support only unordered iterators, and others which support both forward and reverse iterators. For a given container type, it can represent its "index set" in any of these ways.

-- Claire Dross Senior Software Engineer, AdaCore

Take care, -Tuck

— Reply to this email directly, view it on GitHub https://github.com/Ada-Rapporteur-Group/User-Community-Input/issues/21#issuecomment-1234427135, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANZ4FOGPPKY4DN6L6MUGEDV4DCMNANCNFSM57HOJQXA . You are receiving this because you authored the thread.Message ID: @.*** com>

clairedross commented 2 years ago

That would require special handling in the compiler though, right ? I must admit I do not see how to have a more general approach.

Yes, but the compiler already has warnings which use the fact that things are (not) modified, like the one on variables which could be declared as constants, so it is probably doable. -- Claire Dross Senior Software Engineer, AdaCore

sttaft commented 2 years ago

The key point is that you call a single operation to retrieve the next element, and it has a second result which tells you whether there is such a next element. So you have two outputs -- one is a flag that indicates whether there is a next element, and the second output is the next element, if any. Equivalently, you can think of this as an "optional" or "maybe" construct optionally containing the next element. In ParaSail, since every type has a "null" value, iterators return an "optional T" meaning that it either returns a "normal" value of type T, or the "null" value of the type.

-Tuck

On Sun, Aug 28, 2022 at 4:23 AM Emmanuel Briot @.***> wrote:

On 2022-08-23 15:40, sttaft wrote:

By "consumption" I don't believe @briot https://github.com/briot means you are modifying the underlying structure. Rather, you are modifying the /iterator/ by the "get_next" operation, rather than separating the Element and Next operations ("Get_Element" and "Advance" in the example).

Hi Tucker,

That it correct. In my mind, a lot of iteration occurs on what could be called "virtual containers", i.e something that behaves like a container, but actually gets its data from other sources. Think of database queries, or sockets for instance, though it could also be calls to the operating system (reading the files from a directory for instance), or wrappers for third-party libraries.

In practice, these virtual containers provide the equivalent of a read() function, and that's it. If you are lucky, you can check whether there is some more to read, but this isn't always the case....Message ID: @.*** com>

jprosen commented 2 years ago

Le 23/08/2022 à 14:33, Richard Wai a écrit :

At a strategic level I'd usually consider just how "needed" this feature is compared to the tactical reality of realizing it. I'm not super convinced it is a "needed" change, given how generally rare the need is to "roll your own" container types. However, since this seems to already be a thing in GNAT, that makes the cost of implementation relatively low.

For Gnat. However, PTC also has an implementation of Ada 2012. We should not take decisions based on GNAT only.

-- J-P. Rosen Adalog 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52 https://www.adalog.fr

ARG-Editor commented 2 years ago

Tuck wrote:

The value denoted by First must denote a primitive operation of the container type that returns a Cursor, which must a be a type declared in the container package or visible from it.

I don't get this. This sounds like there has to be a type named Cursor somewhere nearby, which would be incredibly limiting. Or do you mean that the type is just inferred and required to match across the various operations? That's how Aggregate works.

I note that this interface is extremely limited. There is no way to pass parameters to the iteration, for instance to select a starting point or otherwise to select a subset. (Copying containers is too much a performance hit to consider.) As such it is only good to iterate over an entire container, which is a pretty rare need outside of Q&D debugging code.

I see the concern about the interface inherent in iteration, but that is an interface in name only. (And it exists mainly because I was trying pander to the misguided souls who think interfaces are good for anything at all. :-) It should have been an abstract type at most.) Actual iteration should not do any dispatching. (I realize that the Ada Containers return class-wide types, but that is clearly a major design mistake; if the private types were defined in the spec, there would be no dispatching. I think we were trying to reduce the number of names in the specs, but that wasn't the way to do that.)

If we were starting Ada from scratch, perhaps this design would be considered better. (I'm dubious about that, though, as there is no way to pass iterators around in this design without copying everything.) But it seems too redundant and too underpowered to add to Ada on top of the existing facilities.

                           Randy.

ARG-Editor commented 2 years ago

Jean-Pierre Rosen writes:

Le 23/08/2022 à 14:33, Richard Wai a écrit : At a strategic level I'd usually consider just how "needed" this feature is compared to the tactical reality of realizing it. I'm not super convinced it is a "needed" change, given how generally rare the need is to "roll your own" container types. However, since this seems to already be a thing in GNAT, that makes the cost of implementation relatively low. For Gnat. However, PTC also has an implementation of Ada 2012. We should not take decisions based on GNAT only. …

Strongly agree. We do not want to get to a world where there is only one possible implementation, since that renders the standard (and much of Ada's focus on portability) irrelevant. And one never wants to give any one entity too much power over anything.

The existence of a feature in some existing implementation provides some proof of implementability, but that is about it. It certainly has little to do whether a given idea is worthy for standardization.

                         Randy.

sttaft commented 2 years ago

Randy wrote:

Tuck wrote:
The value denoted by First must denote a primitive operation of the container type that
returns a Cursor, which must a be a type declared in the container package or visible from it.
I don't get this. This sounds like there has to be a type named Cursor somewhere nearby, which would be incredibly limiting. Or do you mean that the type is just inferred and required to match across the various operations? That's how Aggregate works.

Note that this wording is not mine -- it comes from the GNAT reference manual. I would guess that the name can be anything. I presume it is talking about the type returned by the operation. Saying that the type must be declared in the container package or visible from it is pretty close to a vacuous statement in Ada, since you can't talk about a type that is not "visible." Perhaps they meant "directly visible," though I doubt it. The main point, I believe, was that the cursor type need not be declared in the same package as the container type.

If we were starting Ada from scratch, perhaps this design would be considered better. (I'm dubious about that, though, as there is no way to pass iterators around in this design without copying everything.) But it seems too redundant and too underpowered to add to Ada on top of the existing facilities.

This is not a question of replacing anything or starting from scratch. My belief is that we might have several different idioms that all map to the iterator syntax via syntactic sugaring. So the suggestion here is not to eliminate anything, but rather provide one or more alternative "de-sugarings" for the iterator syntax. The iterator syntax is an abstraction, and it is well understood that a given abstraction can be implemented in more than one concrete way.

The fact that GNAT implements something like this already is mostly an indicator of user demand. I agree it should not necessarily be adopted exactly as it, but rather it shows the fact that this same iterator syntax can be usable for multiple underlying idioms, without undue implementation burden.

ARG-Editor commented 2 years ago

Claire said that this was added for SPARK. Unless we want the SPARK tail to wag the Ada dog, that doesn't indicate any sort of "user demand".

I certainly agree that the way that iteration is defined in Ada is painful to use. But it does work. The last thing we need is another way to write the same thing (and not even all of the abilities of that).

I understand the argument that it is "just another way" to define a single feature. But it NEVER works out that way in Ada. Aspect specifications are supposed to be "just another way" to specify aspects, but the resolution is different, the place that they are evaluated is different, and in the end, hardly any of the code of an implementation can be shared. Of course, aspects support a lot of things that could never have been done via attribute definition clauses, so they are a net win. Even so, they add to the complication of implementations and educational materials.

The same is true here, but without the majority of the benefits. The only benefit is to make it easier to write something that few users should be writing anyway (and even those that do write it should only be doing it once per ADT). That does not seem to be enough for the definitional and implementation pain. (Remember all of the rules that we had to fix for aspect Aggregate??)

sttaft commented 1 year ago

I think it is probably a misnomer to consider SPARK the "tail" of the dog these days. In fact, our experience is that SPARK is more the "nose" of the dog, which gets into the tent (sorry to mix metaphors) and then the whole Ada caboodle comes in after the SPARK nose has been embraced (let me count the metaphors ...). In any case, "demand" comes from many different directions, and the information from @briot is from a most assuredly non-SPARK application (they are doing financial stuff with Ada). The other "demand" comes from the embedded world, who also want to minimize the use of "fancy" features. The current Ada iterator mechanism is overkill, given the new flexibility provided by aspects, and ends up being an entry barrier to defining new container structures.