Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype
27 stars 1 forks source link

Generalizing enumeration representations #52

Open ARG-Editor opened 1 year ago

ARG-Editor commented 1 year ago

This issue continues an unfinished topic from Ada 2022. This issue was created to fulfill the ARG resolution of November 10, 2022.


Setting enumeration representation values are done separately from the type declaration, and have rather restrictive rules.

When the representation value is an integral part of the meaning of the type (as often happens when mapping to hardware), this duplication is annoying, especially when many enumeration literals are involved.

Similarly, the restrictive rules associated with enumeration representation values can make it hard to map hardware values to enumeration literals. Less restrictive rules would be helpful.

[Editor's note: This issue was raised from Ada practitioners.]

AI12-0365-1 was created from the original comments; the comments arrived too late to include in Ada 2022.

The following proposals were made in that AI; they include an analysis by the editor as to the feasibility and consistency with Ada semantics.

(1) Allow aspects on enumeration literals, and define one for the representation value. This would look like:

type Opcode is (Add with Rep_Value => 10, Sub with Rep_Value => 11);

Enumeration literals are one of very few declarations in Ada that don't allow aspect specifications, and it certainly would be consistent to allow them there. Unfortunately, the result is a bit wordy.

(2) Allow representation values to be in any order (as opposed to the current requirement that the "codes ... satisfy the predefined ordering relation of the type").

This would make ordering operations more expensive (as the representation values would have to be converted to position values before comparing), but otherwise would not change the needed code. (Note: It could change the cost of a range check, but only in cases where the representation values are currently contiguous; this case is already a special case [a general range check has to use position values] and could still be special-cased if desired).

Note that the representations still would have to be different for each literal - there needs to be a one-to-one mapping (so that operations like 'Pos, 'Val, 'Image, and 'Value work consistently), but there needs to be no order in that mapping.

(3) Allow alternative names to be declared in the enumeration declaration:

type Opcode is (Add with Rep_Value => 10, Alt_Name => Sub, Mul with Rep_Value => 20);

Semantically, this would be the same as the renames:

 function Sub return Opcode renames Add;

The oddity here is having a declaration in an aspect specification. We don't currently have anything like that.

(4) Merge the semantics of the enumeration representation clause with the enumeration type. This would be a less wordy version of (1). This would look something like:

type Opcode is (Add => 10, Sub => 11);

Currently, an enumeration representation clause is resolved as an array aggregate, and any syntax of an array aggregate can be used. That would not be an appropriate definition for this combined declaration, as the aggregate choices don't exist yet, and in any case the type Opcode can't be resolved in its own declaration.

This necessarily means that some custom resolution rules would have to be developed. One possibility would be to define this as a shorthand for the longer aspect specification version given above.

This could be done, but it would be substantial amount of work.

(5) Instead of (2), decouple position values from the order of declaration, assign the position numbers based on the order of the representation values.

This could be done semantically, but it hides a very important property which would no longer be obvious from the declaration. In particular, inclusion/exclusion in/from enumeration ranges depend on the position numbers.

(6) Add unordered enumerations. No ordering operators could be used with these types.

This fixes some of the problems of (5) (and could be combined with (5)), but it brings up issues of its own. One obvious one is that ranges have to be banned for unordered enunerations. Set memberships could replace ranges in most uses (in predicates for subtypes, for instance). However, the basic semantics of all discrete types are based on the base range of the type, and a type without a base range would be a problem. Moreover, no ranges means no arrays indexed by them, eliminating a significant use of enumerations.

Another obvious problem is generic matching; an unordered enumeration shouldn't match a generic formal type with ordering (otherwise, the generic body might depend upon ordering operations of the type). That would mean that an unordered enumeration couldn't be used as a formal discrete type, at a minimum with a usual one (one could define an unordered formal discrete type, but that would have to necessarily be different than the existing ones in some way, such as with an aspect).

(7) Allow overlapping representations, that is, allow more than one enumeration literal to have the same representation value. For instance:

type Opcode is (Add with Rep_Value => 10, Sub with Rep_Value => 10, Mul with Rep_Value => 20);

A one-to-one mapping is fundamental to the Ada semantics. If two values have the same representation value, but necessarily have different position values, how can the compiler tell them apart? How could Add /= Sub if they have the same code?

If they have the same position value, then the issues of (3) occur. Moreover, what is the result of Opcode'Image(Sub)? Note that this makes the Image lookup table more complex, as one would expect Opcode'Value("SUB") to produce Sub (or Add).

Note that in this latter case, a renaming (either as a function or a constant) is a better option that doesn't mess up the language semantics:

type Opcode is (Add with Rep_Value => 10, Mul with Rep_Value => 20);

Sub : constant renames Add; -- Ada 2022 short-form object renaming. But doesn't overload. or simply Sub : constant Opcode := Add; -- Static constant. Doesn't overload, either. or function Sub return Opcode renames Add; -- Does overload. or even function Sub return Opcode is (Add); -- Also overloads.

(8) Allow the use of an earlier enumeration literal in a later representation. For instance:

type Opcode is (Add with Rep_Value => 10, Sub with Rep_Value => Add+1, Mul with Rep_Value => Add*2);

Ada, however, hides literals from all visibility until the enumeration declaration is finished. Moreover, this proposal seems to assume that the literals are some sort of integer type, while they really are enumeration values with no math operations. Thus, these things violate the strong typing of Ada.

Declaring a constant works for this:

Base_Rep : constant := 10;

type Opcode is (Add with Rep_Value => Base_Rep, Sub with Rep_Value => Base_Rep+1, Mul with Rep_Value => Base_Rep*2);


It's unclear which of these ideas are worth the effort. We would like feedback from Ada practitioners if any of these ideas would be valuable in your projects.

joshua-c-fletcher commented 1 year ago

I think (5) "decouple position values from the order of declaration, assign the position numbers based on the order of the representation values." would be a problem in particular for descendent types.

For example, you can define and enum, and then define a new type that has the same values and a different representation clause. The position values remain the same, but the representation values can be different. Re-arranging the position values based on the representation values would result in the position values changing for descendent types that use different representations.

The representations can be overridden by descendent, just like the layout and size allotted to elements of a record can, whereas the position values are a more essential part the type that doesn't get overridden by descendents.

I like (4) in that it is a concise way of mapping an enum to its representation, but it blurs the definition of an enum in Ada. Enums are not numbers, they're simply represented by numbers, under the hood, and sometimes we need to set and/or interact with these numbers, especially when working with hardware interfaces and certain ICDs. Having the representation clause separate from the type definition (wordy though it is) helps to make that clear.

There are definitely some interesting ideas in this set

Idea (2) allowing representations in any order could be appealing. I don't want to add cost to working with enums, but it always seemed a bit arbitrary that the representation had to be in the same order as the enum values itself. As you described, though, it makes sense; since operators on the enum type would need to be have like they use the position values, and while they're both in ascending order the comparisons using the representation values would be equivalent.

steven-bellock commented 11 months ago

I am in support of (1) since it reduces the size of the declaration by half. Could it be (something like)

type Opcode is (Add => 10, Sub => 11);

?

sttaft commented 11 months ago

On Tue, Nov 7, 2023 at 6:28 PM Steven Bellock @.***> wrote:

I am in support of (1) since it reduces the size of the declaration by half. Could it be (something like)

type Opcode is (Add => 10, Sub => 11);

?

That was option (4) in the original note. And I agree that seems like the simplest, most intuitive enhancement.

Take care, -Tuck

Message ID: @.*** com>

CKWG commented 6 months ago

I am in support of (1) since it reduces the size of the declaration by half. Could it be (something like)

It has never been Ada's goal to avoid writing!

While I can see a small advantage with this proposal, I think Ada has already a perfect solution for internal representation. Enums are not numbers, and the literals carry the semantics. I've spent many years writing safety critical software, and I've never needed to care about internals. Sure, for milbus or arinc transfer, they are necessary, but well hidden. If we go in this direction to make the rep more visible, the next wish will be to have arithmetics with enums. If you want unordered enums, make the ordering functions abstract.

Rant: I see more and more proposals to shorten Ada syntax: Omitting declare blocks in if-statements and what not. Next we propose to omit the begin end brackets and rely only on indentation... Vae victis if you lose the indentation when sending code... Rant end

jprosen commented 6 months ago

If you really need to decouple the low level from the high level, use a representation table: Representation : constant array (my_enumeration) of low_level_values := (...); OK, it costs one indexing, but you have complete freedom on the representation, it needs no change to the language, and it makes the representation stuff fully visible.

OneWingedShark commented 1 month ago

Why not add some aspect to delay freezing, which in my experience has been the real pain-point?

Package Example is
  Type Something is (This, That, The_Other)
   with Private_Representation;
  --Stuff that would normally make freezing happen
Private
   For Something use
     (This => 2,
      That => 8,
      The_Other => 11
      );
End Example;
ARG-Editor commented 1 month ago

This doesn't address any of the initial concerns of this issue: a desire to have an option to put the declaration and representation together (as we have with aspect specifications for most other types), nor to make the representation values more flexible, nor a desire to provide an easier way to define alternative names for literals.

Moreover, you are ignoring the implementation cost of such a feature. Freezing of a type implies that the compiler needs to generate code for an entity of that type, and for that to make sense, the representation has to be known. Abandoning that model could cause a huge amount of implementation work; one would hope that any such change would be for something widely used (which representation specifications are not).

The original AI (AI12-0365-1) had a number of suggestions which were responsive to the original issues, and the e-mail thread has some additional ones. The question in my view is whether the issues are important enough to address at all (in particular, to eliminate the requirement that the representation values are in the same order as the position numbers, and whether to allow the representation to be specified as an aspect on the enumeration literals) as opposed to what solutions to apply. (IMHO, the issue isn't important enough to make any significant changes to the Ada model of semantics, so only the two ideas noted in this paragraph really are worth considering. YMMV.)

           Randy.

From: OneWingedShark ***@***.*** 
Sent: Wednesday, September 04, 2024 2:35 PM
To: Ada-Rapporteur-Group/User-Community-Input
Cc: ARG-Editor; Author
Subject: Re: [Ada-Rapporteur-Group/User-Community-Input]

Generalizing enumeration representations (Issue #52)

    Why not add some aspect to delay freezing, which in my

experience has been the real pain-point?

Package Example is
  Type Something is (This, That, The_Other)
   with Private_Representation;
  --Stuff that would normally make freezing happen
Private
   For Something use
     (This => 2,
      That => 8,
      The_Other => 11
      );
End Example;

-
Reply to this email directly, view it on GitHub

https://github.com/Ada-Rapporteur-Group/User-Community-Input/issues/52#issu ecomment-2329826133 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AT65YN2ESONN6FV3HCIKN53ZU 5OFVAVCNFSM6AAAAAAYX346ICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRZHAZDM MJTGM . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AT65YN2RCPYJE6GVU6K2WXDZU5OFVA5CNFS M6AAAAAAYX346ICWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTU K3ZJVK.gif Message ID: @.***>