[RFC] Range Integer Types

JoseRuizAdaCore commented 9 months ago

First draft for removing the need for a symmetric base range for signed integer types.

raph-amiard commented 9 months ago

@dkm @jklmnn Guys could you review this RFC and tell me if you find any glaring holes in it ? It seems sound and simple enough to me but I would like a second read before handing that to the front-end team :) Thanks in advance!

clairedross commented 9 months ago

Hi Jose, I am a bit surprised that you say this RFC has no drawbacks and is backward compatible. Users can access the underlying type used by the compiler using T'Base. They could safely assume that this type was symmetrical before. I am not entirely sure of why you need this, but if it is only for unsigned type and the issue is that you do not want the wrap-around semantics, I believe it would be simpler and more backward compatible to introduce a new annotation No_Wrap_Around on modular types. We have something similar in SPARK to instruct GNATprove that we want overflow checks instead of wraparound semantics.

JoseRuizAdaCore commented 9 months ago

Hi Claire. What I meant when I say no drawbacks and backward compatible is that compilers can choose to do what they do today (symmetric base type) and they would respect the proposed wording.

The reason why I want it is to simplify the life of embedded developers. Forcing a larger base type has implications in the run-time capabilities needed.

Now your suggestion of No_Wrap_Around on modular types is very interesting. I think it gives me what I want and it is better for compatibility. Thanks!

raph-amiard commented 9 months ago

@dkm @jklmnn Never mind then :)

jklmnn commented 9 months ago

While I agree that for full range integer types modular types with No_Wrap_Around could be a solution I still think this change would be a good improvement. The forced symmetry is something that I have seen being a problem many times and e.g. (@treiher correct me if I'm wrong) is one of the reasons we currently only support 32 bit types in RecordFlux.

At least to me adding No_Wrap_Around to modular types look a bit like a workaround. We're changing the semantic of one type to make it work like another type because that other type has the correct semantic but a limitation that doesn't allow us to use its full range.

Additionally I think modular types don't fill all the use cases, even with No_Wrap_Around. I still cannot define a 64 bit positive type:

type Positive_64 is range 1 .. 2 ** 64 - 1;

clairedross commented 9 months ago

The proposed modification seems small, but I would expect it to be a big change in the compiler and not backward compatible for users if the compiler was to use this opportunity in general. So I think it might be a good idea to get an opinion from compiler/language experts before going in this direction.

yannickmoy commented 9 months ago

@jklmnn I think you can define the type you want:

type Natural_64 is mode 2**64 with No_Wrap_Around;
type Positive_64 is Natural_64 range 1 .. 2**64-1;

clairedross commented 9 months ago

If you define it as Yannick proposed, you have the advantage of having the 0 in the base type, so for example you will be able to use it to index an array without worrying about invalid empty array aggregates...

treiher commented 9 months ago

While I agree that for full range integer types modular types with No_Wrap_Around could be a solution I still think this change would be a good improvement. The forced symmetry is something that I have seen being a problem many times and e.g. (@treiher correct me if I'm wrong) is one of the reasons we currently only support 32 bit types in RecordFlux.

We only support up to 63-bit integer types in RecordFlux at the moment. The reason is that SPARK uses another representation for modular integer types during proofs, even with No_Wrap_Around, which makes the generated code much more complicated to prove.

@jklmnn I think you can define the type you want:
type Natural_64 is mod 2**64 with No_Wrap_Around;
type Positive_64 is Natural_64 range 1 .. 2**64-1;

@yannickmoy Interesting. So we could use this approach to define a 64-bit integer type that is handled like other signed integers in SPARK proofs?

jklmnn commented 9 months ago

@jklmnn I think you can define the type you want:
type Natural_64 is mode 2**64 with No_Wrap_Around;
type Positive_64 is Natural_64 range 1 .. 2**64-1;

To be honest I think that is really confusing and I don't think this is a good solution. If you look at it as someone without any prior knowledge of Ada what would you expect? You would think that modular types have a wrap around and are to be used e.g. for opaque integers or calculations where you want to have that behavior. And you have range types that have overflow checks and whose range you can define. This was the expectation I had when I started with Ada and I don't find it unreasonable to have this expectation coming from other languages. With @JoseRuizAdaCore's RFC this would be it.

Now if you use No_Wrap_Around for that you get a different situation. You can still use modular types as is, except if you add this aspect then they're overflow checked. That's fine so far. Now range types are different. They behave like integers with overflow checks unless you want them to be unsigned and fit into the size you'd expect (say they're not symmetrical). In this special case you can't use a range type. Instead you need a modular type with an aspect and additionally define yet another type from that modular type that contains the range you want to have. I don't see how this is an improvement over the first situation.

If you define it as Yannick proposed, you have the advantage of having the 0 in the base type, so for example you will be able to use it to index an array without worrying about invalid empty array aggregates...

Don't I have this problem already with symmetric integers if I don't include 0? As from my understanding you can't freely define your range (@JoseRuizAdaCore please correct me if I'm wrong), e.g. you can't define

type Pos_32 is range 1 .. 2 ** 32 with Size => 32;

~~but you can define~~

type Pos_32 is range 1 .. 2 ** 32 - 1 with Size => 32;

in which case 0 would still be 0 if mapped to Universal_Integer. Also it's not clear to me what invalid empty array aggregates are. Does the compiler try to use 0 as a default value in this case and fails if 0 is not within the type?

I got a bit confused. One can already define an integer with the size aspect and with an arbitrary range as long as it doesn't contain more values than can fit into the given size. But I still think my point stands and that this RFC is an improvement for two reasons:

It could simply be a compiler optimization
Using the approach with a modular type is still confusing. Even more so unless you know about this specific behavior of the compiler I'd say people will simply define their 64 bit range type without noticing that it's going to use 128 bit arithmetic (unless your platform doesn't support that and it breaks and nobody knows why).

clairedross commented 9 months ago

If you define it as Yannick proposed, you have the advantage of having the 0 in the base type, so for example you will be able to use it to index an array without worrying about invalid empty array aggregates...

Don't I have this problem already with symmetric integers if I don't include 0? As from my understanding you can't freely define your range ( @JoseRuizAdaCore https://github.com/JoseRuizAdaCore please correct me if I'm wrong), e.g. you can't define

type Pos_32 is range 1 .. 2 ** 32 with Size => 32;

but you can define

type Pos_32 is range 1 .. 2 ** 32 - 1 with Size => 32;

in which case 0 would still be 0 if mapped to Universal_Integer. Also it's not clear to me what invalid empty array aggregates are. Does the compiler try to use 0 as a default value in this case and fails if 0 is not within the type?

You can test it already (using modular types so that you do not have a symmetric base type):

— Reply to this email directly, view it on GitHub https://github.com/AdaCore/ada-spark-rfcs/pull/109#issuecomment-1827805081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6NCMPD3RXFCKYVDIJCGLLYGSGKJAVCNFSM6AAAAAA7GNZH6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRXHAYDKMBYGE . You are receiving this because you commented.Message ID: @.***>

-- Claire Dross (she/her) Software Engineer, AdaCore

clairedross commented 9 months ago

On Mon, Nov 27, 2023 at 2:41 PM Claire Dross @.***> wrote:

If you define it as Yannick proposed, you have the advantage of having the

0 in the base type, so for example you will be able to use it to index an array without worrying about invalid empty array aggregates...

Don't I have this problem already with symmetric integers if I don't include 0? As from my understanding you can't freely define your range ( @JoseRuizAdaCore https://github.com/JoseRuizAdaCore please correct me if I'm wrong), e.g. you can't define

type Pos_32 is range 1 .. 2 ** 32 with Size => 32;

but you can define

type Pos_32 is range 1 .. 2 ** 32 - 1 with Size => 32;

in which case 0 would still be 0 if mapped to Universal_Integer. Also it's not clear to me what invalid empty array aggregates are. Does the compiler try to use 0 as a default value in this case and fails if 0 is not within the type?

You can test it already (using modular types so that you do not have a symmetric base type):

procedure Main with SPARK_Mode is type Nat_32 is mod 2 ** 32;

type My_Array is array (Nat_32 range <>) of Integer;

X : My_Array := [];

begin null; end Main;

raised CONSTRAINT_ERROR : main.adb:6 range check failed -- Claire Dross (she/her) Software Engineer, AdaCore

JoseRuizAdaCore commented 8 months ago

To me, the main point behind this RFC is that imposing a symmetric base range seems like an overlook to me. I don't see any advantage.

So the situation we have is that:

either we implement the No_Wrap_Around attribute for modular types. Then we can define: type Natural_64 is mod 2**64 with No_Wrap_Around, Size => 64; type US_64 is new Natural_64 range 0 .. 2**64-1 with Size => 64;
or we implement this RFC and we can define: type US_64 is range 0 .. 2**64-1 with Size => 64;

I think the semantics of both options would be equivalent, but the second is less confusing.

Implementing this RFC changes the semantics of the base type, but we could keep the symmetric base type as the default and the non-symmetric base type controlled by a compiler option.

yannickmoy commented 8 months ago

@JoseRuizAdaCore you can already define with GNAT a 64-bits signed type from 0 to 2**64-1, so GNAT accepts:

type US_64 is range 0 .. 2**64-1 with Size => 64;

This is what the GNAT frontend calls an "unsigned" type. But its base type is still the 128-bits signed integer type.

yannickmoy commented 8 months ago

So if you want to be able to specify that its base type is only 64-bits, you need a new annotation, like:

type US_64 is range 0 .. 2**64-1 with Size => 64, Base'Size => 64;

Now, what are the consequences for compilation, in particular for bound checking?

ebotcazou commented 8 months ago

Note that overflow checking is implemented differently for signed and unsigned types: the former uses the Overflow flag on most processors, whereas the latter uses the Carry flag for addition but is potentially more involved for multiplication.

yannickmoy commented 8 months ago

@ebotcazou so it would not be an issue to have a type like US_64 above whose base type has also 64-bits?

JoseRuizAdaCore commented 8 months ago

So if you want to be able to specify that its base type is only 64-bits, you need a new annotation, like:
type US_64 is range 0 .. 2**64-1 with Size => 64, Base'Size => 64;
Now, what are the consequences for compilation, in particular for bound checking?

But if you don't change the Ada definition of signed integer types (i.e., if you still force a symmetric base range) the Base'Size => 64 would be rejected.

sttaft commented 8 months ago

I am not entirely sure what José means by an overlook but there are pretty good reasons why Ada has always required symmetric signed integer types. It will require significantly more care to use an asymmetric signed integer type properly. For example, simple things like "if X - Y in -5 .. +5" will fail if Y is greater than X. Even "if abs(X - Y) in 0 .. +5" will fail if Y > X, since you can get Constraint_Error if an intermediate in some computation goes outside the base range of the type, and for such an asymmetric type, there would be no negative values in the base range of the type, so any intermediate that is negative could cause a Constraint_Error.

sttaft commented 8 months ago

But if you don't change the Ada definition of signed integer types (i.e., if you still force a symmetric base range) the Base'Size => 64 would be rejected.

I would agree with José that specifying a size is perhaps hiding the issue. The fundamental issue is that you are changing the effective base range of the type. As indicated in my prior comment, that has implications for the ease of use of the type, or at least, the ease of use of the subtraction operator of the type. One might almost want to say such types don't have a subtraction operator because it is so likely to result in trouble, and clearly unary minus is useless except on the zero value ;-) So I would say this deserves more than just a subtle Base'Size clause. I would say it deserves an aspect such as No_Negative_Intermediates => True.

JoseRuizAdaCore commented 8 months ago

I see Tuck, thanks, very enlightening.

ebotcazou commented 8 months ago

@yannickmoy Overflow checking would change, but I think that both GCC and LLVM have primitives to do it whatever the signedness now, so this would be minimal.

raph-amiard commented 8 months ago

@JoseRuizAdaCore what's the status in your opinion ? Is this close to converging or does this need a rewrite ?

JoseRuizAdaCore commented 8 months ago

@raph-amiard we are not converging. There are big questions about backward incompatibilities.

AdaCore / ada-spark-rfcs

[RFC] Range Integer Types #109