Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype
27 stars 1 forks source link

Aliased return objects instead of access-discriminant-based references #32

Open sttaft opened 1 year ago

sttaft commented 1 year ago

Currently to implement user-defined indexing, such as:

Container(Index) := <expression>;

Ada relies on the Variable_Indexing aspect (see http://www.ada-auth.org/standards/22aarm/html/AA-4-1-6.html) which in turn relies on the Implicit_Dereference aspect (see http://www.ada-auth.org/standards/22aarm/html/AA-4-1-5.html). We would argue that this brings a lot of complexity to a feature that could be very useful in many contexts, but because of the complexity will typically be used only by "experts" (or by the language designers in the language-defined containers). Here we propose an alternative approach, namely "aliased" return objects.

The suggested syntax would be:

parameter_and_result_profile ::=    [formal_part] return [null_exclusion|aliased [constant]] subtype_mark | ...

A call on a function with an aliased return object provides a variable view on some (aliased) part of the object passed to the function via an [in] out aliased formal parameter (or of some variable global to the function if there is no aliased formal), unless the return object is marked as aliased constant. If marked as an aliased constant, it provides a constant view on some aliased part of an object passed via an aliased formal parameter (or of some object that is global to the function). The (static) accessibility level of the result would be the same as that of the aliased formal parameter, or of the function itself if there is no such parameter. If there is more than one relevant aliased formal (presuming we even allow that), it would be the (statically) shortest-lived one.

The Variable_Indexing and Constant_Indexing aspects could now identify functions with aliased return objects. Furthermore, the accessibility-level checking associated with aliased formals would carry over directly to aliased return objects, both of which are designed for compile-time checking. The ultimate goal is to reduce or eliminate the need for (anonymous-type) access parameters and (anonymous-type) access results.

Functions with aliased return objects would also be useful for returning references to (parts of) global variables that are hidden from the view of the caller, even without bothering with the *_Indexing aspects.

There would no need to remember to use an explicit ".all" when using aliased return objects, and little or no need for using the Implicit_Dereference aspect.

At the return statement for a function with an aliased return object, a compile-time check would be performed that the returned object has an accessibility level that is tied to that of one of the aliased formals, or to an object that is global to the function.

Here is an example:

   function Element (Vec : aliased in out Vector; Idx : Index_Type)
      return aliased Elem_Type is
   begin
        if Idx <= Vec.Local_Element_Count then
             --  Some elements are directly nested in Vec
             return Vec.Local_Elements (Idx);
        else
             --  Remaining elements are through a level of
             --  indirection (using a global named access-to-array type)
              return Vec.Rest_Of_Elements.all (Idx - Vec.Local_Element_Count);
        end if;
   end Element;

The first return statement is OK because accessibility of result is tied to aliased formal. The second return statement is OK because accessibility of result is tied to global access type.

Here are some examples of use:

   Element (My_Vec, I) := X;  -- Can assign to result of call
   ...
   My_Vec (I) := Z;  
      --  Presumes Variable_Indexing aspect
      --  of Vector identifies Element function
ARG-Editor commented 1 year ago

During the early part of the pandemic (spring 2020), I was contemplating the question of what I would change in Ada if compatibility was not a concern. One of the things I was thinking about was how to regularize resolution. One obvious way would be to treat objects similarly to enumeration literals, that is as a virtual function call. For variables, that would require a variable-returning function. Since I had a keyword "variable" sitting around (from some changes in object declarations that were on my list), that was easy to do. But of course I had to think about what it would mean.

I essentially came up with the model you have suggested here. (Well, along with extending the model for selecting the overloaded function to call based on the context as we already have defined for user-defined indexing.) I didn't think to much about accessibilty since I had reduced it to a true/false check anyway. (More about that another time; it's not relevant to Ada.)

I realized later that it got rid of the need for the generalized reference, which was a nice side-effect.

So this does seem like a fruitful direction to pursue.

ARG-Editor commented 1 year ago

This issue is addressed in AI22-0075-1 (which will be available shortly).

jere-software commented 5 months ago

I am hugely in favor of this. It is a much more fleshed out idea than the one I had back in 2019. I think this would be great for Ada in general, reducing the need for access types for things like custom bounded containers, etc.

evanescente-ondine commented 2 weeks ago

Lowly beginner here, humbly giving his feedback... I suppose I represent your future customers that didn't make it out of school yet. I want to say, that immediately from the moment I read about user-defined indexing and implicit dereference, I wondered what was the rationale behind introducing this form of so-called "reference types", which doesn't seem to have any particular background in programming at large. It seemed to me returning an simple access type would be enough. If the pointer is hidden one way or an,ot, all the better. I can only find in the AARM this:

We require these functions to return a reference type so that the object returned from the function can act like a variable. We need no similar rule for Constant_Indexing, since all functions return constant objects.

But... aren't both cases achieved with access types as well ? What were more intuitive solutions not considered first ? Also, with this explicit aliasing tagged types wouldn't be necessary just to get indexing.

ARG-Editor commented 1 week ago

But... aren't both cases achieved with access types as well ?

Surely. But bare access types are unsafe for any useful abstraction, because the user can copy them. Thus, the abstraction can never know when it is safe to clean up.

Perhaps I should back up a bit. User-defined indexing uses reference types because they already existed by the time user-defined indexing was added to Ada. Yes, both were added to Ada in Ada 2012, but they were designed independently and several years apart.

So your question ought to be why does Ada define Reference_Types in the first place?

The answer to that is rooted in the desire to have user-defined behavior associated with the dereference operation, and to add safety by having a form of dereference that the user cannot copy (at least not without the abstraction finding out).

We identified that user-defined behavior is needed at two points - before the actual dereference operation, and then when the dereference is completed. We also noted that copies of the pointer being dereferenced can't be allowed.

To see the need for these features, imagine a persistent storage library. The library needs to know when the program is done writing the object so it can write the object back to the persistent storage. And that has to be definitive; if it is not, some changes to the object might never get saved at all. That requires a callback when the client is done using the object, and also that the client have no way to access the object after the callback.

The original solution was a limited access type along with an enhanced storage pool with hooks before an after dereferencing. I forget why limited access types didn't work (you should read the various discussions on the design of Reference_Type if you care). The storage pool idea was weird when the actual access value was a handle of some sort. In that case, the handle was represented as an Address (since that's what Storage_Pools use), and then had to be converted to a real Address by the pre-dereference routine. The effect was to use a single type to represent two different concepts. We couldn't change the type used in Storage_Pools without breaking all existing uses of storage pools, and that was unacceptable.

When looking at that solution, someone noticed that the completion callback was called at the same places that finalization would occur of a very local temporary object. So the suggestion was made to actually define such an object. The Reference_Type was born. Then someone noticed that the rules to prevent copying the access type were very similar to the accessibility check on a very local access discriminant. Thus, the actual access value became a discriminant of the Reference_Type.

The advantage of this design was that little new semantics was needed, both in the RM, and in implementations. And the weirdness of the storage pool design was eliminated. Thus it was chosen over alternatives that we looked at the time.

The "problem" with the use of Reference_Types is that if you don't need the callback, then the use of a separate object is overkill. But you still need (or should want) to prevent access to the object after the use is finished.

My personal feeling is that no abstraction should ever expose a raw access type; they are too unsafe. They should only be used to implement an abstraction, where they can be managed properly. If the abstraction needs a reference of some sort, it should be some sort of handle (presumably a private type). A Reference_Type fills this need well; you could roll-your-own handle but it would end up pretty similar to a Reference_Type.

So I think this entire concern is mostly a case of people wanted to build bad abstractions. It's pretty clear to me that there is no reason to help people do that; they have plenty of ways to do so without adding more.

OTOH, when I tried to design a "better Ada" during the pandemic, I ended up with variable-returning functions essentially for free. And they seemed to be a natural part of the model (my redesign converts objects into a pair of functions, much like enumeration literals are functions). But to actually get rid of Reference_Types, one would need a handler that gets called when the return object of a function is destroyed. That is a pretty weird idea which I didn't try to pursue (the pandemic ended before I got very far into the details).

What were more intuitive solutions not considered first ?

None, as far as I recall. Reference_Types already existed and work had finished on them by the time that user-defined indexing was defined. It was defined mostly for the use of the predefined containers. We did not want direct use of an access type, since the container could remove the underlying element while someone was holding onto it. We solved that with the tampering check, which is implemented in a reference type by making the reference type controlled. (See above.)

Since the containers never return access types anyway, we certainly didn't want user-defined indexing to open up the holes that we just plugged. So it too does not return access types (and should not, IMHO).

Also, with this explicit aliasing tagged types wouldn't be necessary just to get indexing.

The reason that user-defined indexing only works with tagged types has to do with inheritance and other properties; it's nothing to do with the use of reference types. There are very bad corner cases (especially associated with generics) that can't exist with tagged types. Also, in Ada 2012/2022, prefix notation only worked with tagged types, and user-defined indexing is a similar mechanism.

IMHO, all new abstractions should be defined with tagged types anyway. The overhead is minimal, and you are enabling future clients to create extensions (rather than having to change the original abstraction). It's better in almost all ways. Untagged types should be limited to "helpers" like enumerations and very simple non-private records (like a coordinate type).

One probably could allow untagged types as user-defined indexing, but it would require a whole new set of restrictions on usage, and in any case that is a very separate issue from the subject of this topic.

          Randy Brukardt, ARG Editor.
evanescente-ondine commented 1 week ago

I understand better now, and can't agree more on the necessity to hide access types as much as possible... They'd better lie buried in the depth a package body never to be bothered with again as far as I'm concerned. I assume the discussions are in the RFCs. Thank for this long explanation, I definitely bit more than I can chew !).