Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype
26 stars 1 forks source link

Finalization without tagged types #63

Open sttaft opened 11 months ago

sttaft commented 11 months ago

In Ada 95 we use tagged types as the basis for various features, but as Ada has evolved since then, we have begun to use aspects rather than extending a "magic" tagged type as a way to give special properties to certain types. There are upsides and downsides to using tagged types, but for resource-constrained embedded systems, aspect-based features can in some cases be easier to implement, and in some cases easier to verify, because class-wide types and run-time dispatching need not be part of the equation.

In any case, here is a proposal by Romain Beguet (AdaCore engineer) from about three years ago, for a "lighter weight" finalization approach. Essentially it is substituting the use of aspects for the use of the special Ada.Finalization.[Limited_]Controlled types. I am posting it here to re-open the discussion. One possibility would be to focus on the "limited" case first, where there is no user-defined assignment, but just user-defined initialization and finalization. On the other hand, if we make it a full capability feature, then probably the existing Ada.Finalization package should be relegated to the Obsolescent annex (Annex J).

Similar arguments might be made (and have been made) for Storage_Pools, Streams, and Iterators, but those would deserve their own Issue in any case. One could imagine a goal for the next version of Ada would be to move all features based on "magic" tagged types/interfaces over to aspects, if that were considered a generally useful shift.


Summary

We introduce a new finalization mechanism that does not rely on tagged types, has simpler semantics and weaker guarantees than today's controlled types in such a way that:

  1. It imposes less design constraints to users.
  2. It can be supported on broader range of platforms (e.g. embedded).
  3. It allows for an efficient implementation.

Note: we abuse the term "finalization" throughout this RFC to denote control over the whole lifetime of an object, i.e. the same level of control that controlled objects provide today.

Motivation

First of all, current finalization based on controlled objects forces users to turn their untagged type into a tagged type in order for it to benefit from finalization. Thus, new legality rules apply (e.g. RM 3.9.2) which can break existing code in many different ways. One such example is illustrated below:

type T is tagged null record;
type U is record;

function F (X : T) return U;

In this situation, one cannot simply turn U into a controlled object, otherwise the subprogram F would become a primitive of two tagged types, which is forbidden.

Second, the guarantees provided by controlled types are very strong, requiring a complex implementation and incurring a substential runtime performance penalty. On some platform, it is extremely difficult (impossible?) to write an implementation that fulfills all those guarantees.

One such guarantee is that an access-to-controlled type should finalize all objects that have been heap-allocated through it once it goes out of scope (todo: link to RM). The compiler must therefore generate code to keep track of these objects, untrack them upon explicit deallocation, etc., which obviously induces a significant overhead at runtime.

For the record, GNAT already supports some custom aspects to weaken the default guarantees mandated by the Ada specification, such as pragma No_Heap_Finalization and pragma Finalize_Storage_Only.

Guide-level explanation

We propose to introduce three new Ada 2012 aspects, analogous to the three controlled-type primitives, as in the following template:

type T is ...
   with Initialize => <Initialize_Procedure>,
        Adjust     => <Adjust_Procedure>,
        Finalize   => <Finalize_Procedure>;

The three procedures have the same profile, taking a single in out T parameter.

We follow the same dynamic semantics as controlled objects:

Examples

A simple example of a ref-counted type:

type T is record
   Value : Integer;
   Ref_Count : Natural := 0;
end record;

procedure Inc_Ref (X : in out T);
procedure Dec_Ref (X : in out T);

type T_Access is access all T;

type T_Ref is record
   Value : T_Access;
end record
   with Adjust   => Adjust,
        Finalize => Finalize;

procedure Adjust (Ref : in out T_Ref) is
begin
   Inc_Ref (Ref.Value);
end Adjust;

procedure Finalize (Ref : in out T_Ref) is
begin
   Def_Ref (Ref.Value);
end Finalize;

A simple file handle that ensures resources are properly released (Taken from a discussion in this RFC)

   type File (<>) is limited private;

   function Open (Path : String) return File;

   procedure Close (F : in out File);
private
   type File is limited record
      Handle : ...;
   end record
      with Finalize => Close;

Heap-allocated finalized types

As already mentioned, today's controlled objects allocated on the heap through an access type T must be finalized when T goes out of scope. First of all, we propose to completely drop this guarantee for libary-level access types, meaning program termination will not require finalization of heap-allocated types. The rationale for this is that in most cases, finalization is used to reclaim memory or release resources, which the underlying system (if any) generally does regardless upon program termination. As for baremetal platforms, heap allocation is either not available/allowed (meaning this is a non-issue) and if it is, we assume that manual deallocation is required and therefore finalization will be properly executed.

As for nested access-to-finalized types, there are at least two simple ways to reason about them:

Finalized tagged types

We propose that aspects are inherited by derived types and optionally overriden by those. The compiler-generated calls to the user-defined operations should then be dispatching whenever it makes sense, i.e. the object in question is of classwide type and the class includes at least one finalized-type.

Composite types

When a finalized type is used as a component of a composite type, the latter should become finalized as well. The three primitives are derived automatically in order to call the primitives of their components. If that composite type was already user-finalized, we propose that the compiler calls the primitives of the components so as to stay consistent with today's controlled types's behavior. So, Initialize and Adjust are called on components before they are called on the composite object, but Finalize is called on the composite object first. This is the easiest approach as it avoids confusing users and its semantics are already battle-tested, but could still be revised.

Interoperability with controlled types

In order to simplify implementation, we propose to initially forbid any of these new aspects on a controlled-type, components of a controlled type, composite types of which any part is controlled and interfaces which are derived by a controlled type.

Constant objects with finalization

The profile suggested above for the three primitives takes an in out parameter. How should we handle constant objects of a finalized type?

First, note that Initialize is out of the equation since constant object require explicit initialization. Adjust is also out because constant objects obviously cannot be reassigned to. We are therefore left with Finalize. We could either take the same approach as controlled-types and let the parameter be in out, or we could introduce a new aspect Finalize_Constant that is called in-place of Finalize, which takes an in parameter instead. In this scenario, we suggest a warning could be emitted if a type specifies Finalize but does not specify Finalize_Constant and a constant object of that type is declared.

Drawbacks

Since this partly overlaps with controlled-types, new users could get a bit lost.

Prior art

TBD. Talk about RAII in languages such as C++.

Richard-Wai commented 11 months ago

I think it's a bit hard to swallow what is clearly a duplicated mechanism, though I sympathize with the perceived need. However I feel doubtful that this would be the correct tactical approach to solving this problem.

If, at the end of the day, the real goal here is avoiding the potential dynamic dispatch of class-wide tagged types, then Ada 95 has already done a lot of heavy-lifting here. Since Ada's OOP model is already designed to allow the programmer to control if and when dynamic dispatching occurs (by limiting such things to specifically class-wide views), it seems like we should lean on this mechanism to get what we really want here.

Put another way, I don't think the required use of tagged types to achieve RAII semantics is the real problem in this context, rather it is the potential for dispatching during finalization. For a deeply embedded system where we still want user-defined finalization, it seems that a controlled type with only static dispatch should be functionally identical to this proposal.

Would it not be more consistent and more broadly useful, therefore, if instead we introduce a way to disable class-wide types for particular declarations of a tagged type? This would be similar to the No_Dispatch High Integrity restriction, but applied to a specific type. Maybe we could just have a No_Dispatch aspect for any tagged (sub)type declaration, which would then prohibit any occurrence of T'Class for a tagged subtype, and would further prohibit any view (conversion) of an entity of type T to a class-wide type of any ancestor. A regular old controlled type with this aspect would then, for all intents and purposes, be the same as what is proposed here, IIUC.

sttaft commented 11 months ago

I agree that specifying No_Dispatch on individual type hierarchies might be a nice solution.

Fabien-Chouteau commented 11 months ago

If, at the end of the day, the real goal here is avoiding the potential dynamic dispatch of class-wide tagged types, then Ada 95 has already done a lot of heavy-lifting here.

The proposal here provides more motivation points that just avoiding dynamic dispatch.

And I can add the following:

I think the point here is to asses the choice of using "inheritance to a special tagged type" for finalization. It would be interesting to see arguments in favor of the current finalization scheme vs the one proposed here. I can see how the current one makes the definition of the feature easier to handle in RM, or actually how it made it easier before aspects where a thing. But from the point of view of users and implementors, I don't see the benefit of the current solution.

sttaft commented 11 months ago

Inheritance: because of Ada's single inheritance, if I want my type to be controlled then I cannot inherit from another tagged type.

One typical solution to this is to use a controlled component, rather than directly extending Controlled. Then the enclosing type can be of any sort of composite type, including an extension of another tagged type. And of course, like Java, Ada does support multiple inheritance, but admittedly only when implementing interfaces.

But the tag is overhead, and as you say, might not map well to hardware, and if the type is not a library-level type, you can end up with a dynamic tag.

So there are clearly some advantages to using an aspect-based mechanism. The question remains whether we want to systematically replace extension-based mechanisms with aspect-based mechanisms (moving the extension-based mechanisms to the Obsolescent Annex), or pick and choose which ones to replace, or have partially redundancy between the extension-based mechanism and the aspect-mechanism, with the attendant added user-level complexity.

ARG-Editor commented 11 months ago

Tucker wrote:

...

But the tag is overhead, and as you say, might not map well to hardware, and if the type is not a library-level type, you can end up with a dynamic tag.

For finalization specifically (I haven't thought about any of the other cases), the tag overhead is only a small part of the overhead of finalization.

Unless you are willing to have different representations from controlled objects that are allocated vs. those that are statically declared, you have to have a way to put controlled objects on chains (and those chains have to be doubly linked so that arbitrary objects can be easily removed for Unchecked_Deallocation and when the discriminants are changed for an object [Ada doesn't allow that at the outer level, but it's easy to create components that can have the discriminants modified]. You also have to have a way to figure out which subprogram to call when those objects are finalized. The latter here doesn't have to be a tag per se, but it seems that some sort of pointer is needed.

Moreover, the sort of static finalization that GNAT implements is fiendishly complicated. (It had a steady parade of bugs for years after it was originally implemented.) It's certainly not the sort of thing that other Ada implementers are likely to adopt absent a business case. For the implementers that do most or all finalization following the canonical (chain-based) model, a tag or something like it is absolutely necessary. And I don't think people are going to be rushing to invest a huge amount to effort to replace that model (possibly outside of a few "easy" cases of stand-alone controlled objects).

So I don't buy the overhead argument. The nested type issue is more interesting, especially as Ada should never have allowed nested type extensions (Janus/Ada never did, and won't on my watch -- too much work, too little value).

As with nested extensions, I've rather dubious that there is any significant value to controlled objects of a nested type (that is, a type declared inside of a subprogram). I've sometimes seen a need for "last wishes" code in subprograms, but that often requires access to a number of objects. An "at end" clause would be a better solution to that particular need than any amount of changes to controlled types.

                Randy.
joshua-c-fletcher commented 11 months ago

A different approach would be to redefine the Ada.Finalization types and Root_Stream_Type in Ada.Streams as interface types rather than as abstract tagged types.

Interface types didn't exist in Ada when these types were introduced, and abstract tagged types were the closest thing available. Looking at the definition of these abstract tagged types, there isn't anything obvious that says you couldn't just replace "abstract tagged private" with "interface" in their public definition, as long as some under-the-hood functionality recognized these special interfaces like it currently does these special abstract tagged types.

If we're considering adding the capability anywhere with aspects, that suggests to me that there isn't a whole lot of concrete content needed in the base types in Ada.Finalization that couldn't allow for an implementation as an interface.

If these root types were interfaces, they would still be tagged, so this approach doesn't get rid of the tag... but it does add flexibility - you could make a type that was both controlled and a stream type rather than having to wrap one in the other.

Also, existing Ada code (besides compiler code of course) wouldn't have to change to support this, it would just add more options to how they could be used due to the multiple inheritance available to types implementing interfaces.

For example, a tagged type that was not controlled could have a descendent that adds finalization to the type.

This would really just add some consistent flexibility to the existing solution using existing mechanisms (i.e. interfaces), and is a more direct and familiar evolution of the existing design than the aspect approach which would cause a lot of existing code to be using newly obsolescent types.

I suppose any further discussion on this should go into a new issue, if others think this is a good idea, but we wouldn't choose to do both approaches, so they can't be evaluated independently.

Joshua

sttaft commented 11 months ago

Alas, making Controlled an interface type doesn't work as well as you might hope. When designing Ada 2005, we considered changing Controlled to be an interface type, but we ran into a problem because other considerations required that a private tagged type or private extension cannot be allowed to hide the fact that it implements a particular interface. This rule appears in the Ada RM 7.3(3.2):

  • the partial view shall be a descendant of an interface type (see 3.9.4) if and only if the full type is a descendant of the interface type.

The rationale for this rule is given in the Annotated Ada RM 7.3(7.p/2 - 7.bb/2). The basic problem is that if a type T could "privately" implement an interface, then nothing would prevent some extension T1 of T implementing the same interface again, thereby having one type with two implementations of the operations of a given interface, which wouldn't really make sense. So this "no private interfaces" rule would mean implementing a Controlled interface type would have to be visible, but the Ada 95 feature allowed privately extending Controlled, so we worried about compatibility.

In retrospect, perhaps we should have made Controlled into an interface in Ada 2005, and if you really wanted to hide finalization, you could use a finalized component instead of making the whole object Controlled. Making components finalizable composes more cleanly, because each component can have its own Finalize procedure.

sttaft commented 11 months ago

Unless you are willing to have different representations from controlled objects that are allocated vs. those that are statically declared, you have to have a way to put controlled objects on chains (and those chains have to be doubly linked so that arbitrary objects can be easily removed for Unchecked_Deallocation and when the discriminants are changed for an object [Ada doesn't allow that at the outer level, but it's easy to create components that can have the discriminants modified]. You also have to have a way to figure out which subprogram to call when those objects are finalized. The latter here doesn't have to be a tag per se, but it seems that some sort of pointer is needed.

I agree with you that trying to use a fullyl-static approach to finalization is very complex and fragile. AdaMagic has a good strategy, I believe, where the links are outside the representation of the object, each link providing a pointer to the object, and a pointer to a cleanup procedure. There is only one link per top-level object, no matter how many components of the object require finalization. The cleanup procedure cleans up all of the components that need it. The initialization procedure for a top-level object worries about failure during initialization, which might require partial cleanup.

For heap allocated objects that require finalization, each (top-level) allocated object has additional space for the links of a doubly-linked list, hooking it into a "collection" list for each access type. Unchecked-deallocation removes the object from this list. At end of scope of the access type, the collection list is walked and the object-as-a-whole cleanup procedure is invoked.

These same strategies are used for protected objects (or objects that contain them), which have their own finalization requirements.

In any case, there is no need for the additional space associated with finalization having to be contiguous with a particular stand-alone object, thanks to the level of indirection. This also means that arrays of finalizable objects do not need per-component overhead, since the cleanup procedure is specialized to the problem of finalizing a whole array. The initialization procedure worries about an exception resulting in a partially-initialized array of finalizable components.

ARG-Editor commented 11 months ago

Tucker writes:

AdaMagic has a good strategy, I believe, ...

I'm sure that everyone thinks their strategy is best. :-)

...where the links are outside the representation of the object, each link providing a pointer to the object, and a pointer to a cleanup procedure.

I tried having a universal cleanup procedute briefly, but handling the failure cases for initialization and finalization got so complex in those cases that I gave up. It's not too bad in the simple usual cases, but compilers have to work in any case, and that becomes extremely messy -- it's essentially a shadow record for any controlled components. (It gets messy because of the possibility of partial failure of initialization, the fact that the number/order of components depends on the discriminants and thus aren't known at compile-time -- so a simple integer doesn't work -- and the need to finalize only the objects that successfully initialized.

We deal with initialization failure and the normal case the same way: each controlled object (subcomponent) is initialized and registered individually, so we only have to finalize the objects that are registered.

I was less worried about overhead than actually being able to get it right (without a never-ending stream of bug reports). Perhaps that focus was a mistake (it seems that most implementers worried less about that, and it doesn't seem to have helped passing ACATS tests), but it is what it is.

Finalization is one of those features where your "bump under the carpet" analogy applies -- there is always something hard regardless of the method chosen. In the Janus/Ada case, it is reallocation upon assignment to mutable objects where the discriminants. That requires deregistering and re-registering the individual components. Of course, we might have reallocated the memory in such a case, so one better not leave anything pointing to the original place anyway. (A compiler that didn't try to reallocation memory would have an easier time.)

Anyway, I think given the wildly different implementation approaches, any discussion of overhead for finalization is extremely implementation-specific, and thus is a very weak argument for doing anything for the language in this area. An implementation-specific pragma or aspect would be more appropriate for reducing overhead if that is needed (because it is unlikely to have the same effect on other implementations).

                       Randy.
ARG-Editor commented 11 months ago

Tucker noted:

Alas, making Controlled an interface type doesn't work as well as you might hope. When designing Ada 2005, we considered changing Controlled to be an interface type, but we ran into a problem because other considerations required that a private tagged type or private extension cannot be allowed to hide the fact that it implements a particular interface.

Right; it would be highly incompatible to make finalization or streams an interface without any other changes.

There was enough interest in the idea (post Ada 2012) that we tried to find a set of rules that would allow some of these types to be interfaces. That requires quite a bit of mechanism and the idea was shelved (where it remains today - it was never formally abandoned). See AI12-0023-1 and the various discussions on it (both in the !appendix and the Stockholm meeting minutes of June 2012).

                Randy.