Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype
28 stars 1 forks source link

Capacity of Bounded Container Aggregates #112

Open ARG-Editor opened 9 hours ago

ARG-Editor commented 9 hours ago

Bounded containers have a Capacity discriminant (and sometimes others). Some Ada operations require discriminants to match, and that makes using bounded containers more difficult than unbounded containers, especially when aggregates are involved.

Imagine we have the following declarations:

    package BVI is new Ada.Containers.Bounded_Vectors
        (Index_Type => Natural, Element_Type => Integer);
    A : BVI.Vector(100);

Now, consider the following statements:

    A := [];              -- (1)
    A := BVI.Empty;       -- (2)
    A := BVI.Empty_Vector;-- (3)

All three of these statements probably raise Constraint_Error. (1) must raise Constraint_Error, as the capacity of the temporary object created by the aggregate is defined to be 0, and that certainly does not match the capacity of A. The capacity of the result of the function call in (2) is implementation-defined; it probably isn't 100, and if it isn't, Constraint_Error would be raised.

The capacity of the object Empty_Vector does not seem to be defined either by A.18.2 or A.18.19. [Aside: This seems to be a hole. A.18.2 does not define the initial Capacity of a Vector object. I believe the intent was that it be unspecified, and that works fine for unbounded vectors. For bounded vectors, we need to know because of the discriminant checks on the capacities. The discriminant value suffices for explicit declarations, but we also need to specify the capacity for deferred constants and for function results other than function Empty). Otherwise, we can't determine the behavior with certainty. End aside.] For this reason, we can't tell for certain whether this assignment raises Constraint_Error, but it probably does (the value is unlikely to be exactly the correct one, 100 in this case).

One can accomplish the above assignment in three ways:

BVI.Assign (Target => A, Source => []);
A := BVI.Empty (Capacity => 100);
A := BVI.Copy (Source => [], Capaccity => 100);

These forms are less intuitive than any of the others above.

If we wanted to assign a non-empty value, then A := [1, 2]; -- Raises Constraint_Error always BVI.Assign (Target => A, Source => [1, 2]); -- Works as expected A := BVI.Copy (Source => [1, 2], Capaccity => 100); -- Works as expected

When using bounded containers, it is always necessary to determine an appropriate bound for for the containers. The analysis to do that is often non-trivial. After doing so, one wants to ensure that all objects are using the calculated bound (both to avoid having too-small or wasteful too-large objects, and to reduce discriminant check failures. One way to accomplish that is to make the primary type used a constrained subtype. This could look something like:

package BVI is new Ada.Containers.Bounded_Vectors
    (Index_Type => Natural, Element_Type => Integer);
Max_Elements : constant Natural := 100;
subtype My_Vector is BVI.Vector(Max_Elements);

If we also have a directive to initialize all objects (a commonly used style rule), declarations become challenging:

A : My_Vector := []; -- (1)
B : My_Vector := My_Vector'[]; -- (2)
C : My_Vector := BVI.Copy (Source => [], Capacity => Max_Elements); -- (3)
D : My_Vector := BVI.Empty(Capacity => Max_Elements); -- (4)

(1) always raises Constraint_Error. (2) also always raises Constraint_Error, as the qualification does not affect the capacity, and the aggregate does not have the same constraints as the qualifying subtype. (3) and (4) have the correct effect, but note that we have to repeat the capacity value in these calls. That somewhat defeats the purpose of giving the subtype a name in the first place.

Similarly, if we declare a subnprogram with a My_Vector parameter:

 procedure Put (A : in My_Vector);

Calls on this procedure also become annoyingly complex:

Put ([1, 2]); -- (1)
Put (My_Vector'[1, 2]); -- (2)
Put (BVI.Copy (Source => [1, 2], Capacity => Max_Elements)); -- (3)

(1) and (2) always raise Constraint_Error for the same reasons as noted above. (3) works as expected, but again we have to repeat the capacity value.

And, of course, as noted at the beginning of this Issue, assignment statements also have this problem that a simple aggregate does not work.

We avoided this problem for array aggregates by using an "applicable index constraint" determined from context to set the bounds. It seems that a similar solution for setting the capacity would solve most of these problems with aggregates.

In the next note, I'll outline what that could look like.

ARG-Editor commented 9 hours ago

In the previous note, I outlined the problem with using aggregates with a constrained bounded container. Here, I'll outline a possible solution, some alternatives for the details, and some notes on the solution.

The solution has three parts. The first part is to identify the discriminant that gives the capacity (or if there is no such discriminant).

That could be done with an aspect Aggregate_Capacity, which identifies a discriminant of the enclosing type. The legality/static semantic rules for such an aspect would be very similar to those for aspect Generalized_Reference, which also identifies a discriminant of the enclosing type.

That could look something like:

type Vector (Capacity : Count_Type) is tagged private with Aggregate_Capacity => Capacity, Aggregate => ...;

We need an aspect to determine this for two reasons. First, we need to know which discriminant is the capacity in case there is more than one (and Bounded_Hashed_Maps has two discriminants, this is not a theoretical concern). We certainly don't want to depending upon the name of the discriminant in such a case. Second, we want to know if we want to trigger the "applicable discriminant constraint" rules at all. A user-defined container might have a discriminant that has nothing to do with capacity; the presence of a discriminant alone should not be the trigger for these rules.

An alternative would be to add this as an additional optional component in the aggregate that specifies the Aggregate aspect, rather than as a separate aspect. I think the effect and rules would be pretty much the same for that approach, so I won't discuss it further.

The second part of the solution is the rules determining the "applicable discriminant constraint". These could be fairly simple, but it seems that consistency would suggest applying the complete set of rules found in 4.3.3(10-16). The existing rules are written specifically for array subtypes. We could just duplicate the rules in 4.3.5 with an appropriate rewrite.

But a better approach probably would be to define a set of more general rules in 4.3 for an "applicable constraint", and then use that term to define "applicable index constraint" in 4.3.3 and "applicable discriminant constraint" in 4.3.5. Doing that would ensure consistency between aggregates, and also would help when maintenance is needed in the future.

The third part of the solution is to add a new bullet in front of 4.3.5(36/5). "If the type of the aggregate has aspect Aggregate_Capacity defined, and the context identifies an applicable discriminant constraint, then the value given for the discriminant specified by the aspect Aggregate_Capacity in the applicable discriminant constraint."

This means if the above conditions are met, the capacity of the temporary object comes from the applicable discriminant constraint and none of the other rules apply.

This third rule means that aggregates will always match capacities in assignment statements. For other cases, they always will match capacities in constrained contexts (it might still be needed to qualify aggregates in unconstrained contexts if a specific capacity is needed, but that is easier and more natural than wrapping the aggregate in a call to the Copy function).

This change would be inconsistent with the definitions in Ada 2022, but only in that one exception (Constraint_Error) might become a different exception (Capacity_Error) [and what is evaluated before the exception is raised would change]. This is unlikely to matter in practice.

For example, if we have declare E : BVI.Vector := []; -- (1) begin E := [1, 2]; -- (2)

(1) will have capacity 0, and that would not change with these proposed changes. With the current Ada 2022 rules, (2) will raise Constraint_Error (as 2 /= 0). With these new rules, the aggregate would get the capacity of object E (the target of the assignment), which is 0, so the construction of the aggregate would raise Capacity_Error. The new rules would only evaluate one of the aggregate components, not both. It seems pretty bizarre that anyone would care about these differences.

Note that this fix does not help with the results of unconstrained function calls; those still would have to be passed through Copy or assigned with Assign.


Looking at this solution shows a huge hole: nothing ever specifies the capacity of the result of New_Vector (which is the routine that implements New_Indexed). For indexed aggregates, one imagines the capacity would match that determined by the bounds. But (A) nothing seems to actually require that (the only requirement being that it is big enough for the bounds, nothing prevents it from being excessively large), and (B) that is totally wrong for these cases where the capacity is best determined by context.

We need a version of New_Vector that takes an optional Capacity parameter (similarly to what was done for Empty). And that new version would be the one used for New_Indexed (for all vectors). Finally, if the capacity is determined from context, that value is used rather than the one calculated from the bounds.

That would look like:

function New_Vector (First, Last : Index_Type;
                     Capacity : Count_Type) return Vector is
    (Copy (To_Vector (Count_Type (Last - First + 1)), Capacity => Capacity))
 with Pre => First = Index_Type'First and then
             (Capacity <= Maximum_Length
                  or else raise Constraint_Error) and then
             (Capacity >= Count_Type (Last - First + 1))
                  or else raise Capacity_Error);

We could also define this without the expression function (it presumably would be faster, since one wouldn't need to copy an object).