AdaCore / ada-spark-rfcs

Platform to submit RFCs for the Ada & SPARK languages
62 stars 28 forks source link

[feature] Scales of measurement #119

Open suruena opened 1 month ago

suruena commented 1 month ago

Summary

Allow the programmer to limit the operators available for user-defined scalar types using the scales of measurement concept (from statistics):

  1. Ratio scale: Regular numeric type, no limitation on the operations available. For example, a duration in seconds can be multiplied or divided by other durations / constants / other physical magnitudes. Typical physical magnitudes have this scale, and every statistical function can be used with a set of these values (e.g. geometric mean, arithmetic mean, relative standard deviation...)
  2. Interval scale: Some physical magnitudes in which the absolute value has an arbitrary origin (e.g. ambient temperature in degrees Celsius, or Unix timestamp), so cannot be multiplied or divided (by elements of this type or any other type), or to add them together (it is possible to subtract them, but the result would be a relative magnitude). For example, two Unix timestamps can be subtracted (i.e. so the arbitrary origin is cancelled), giving a duration in seconds (but has no sense to add two Unix timestamps, or to divide it by a constant or other physical magnitude). It is possible to use the arithmetic mean and standard deviation (but not the geometric mean or relative standard deviation, as interval values cannot be multiplied/divided together).
  3. Ordinal scale: Typical enumerate values, which can be compared together for equality or ordering. For example, a message priority can be low, medium, and high. It is not possible to get the arithmetic mean or standard deviation (as each enum value has not a clear value), but for a set of values it can be obtained for example the median / mode / interquartile range / variation ratio...
  4. Nominal scale: Restricted enumerate values without order between them, so only equality / inequality allowed (but not less than, greater than, etc.). For example, a set of colors. It is possible to get the mode or the variation ratio, but is not possible for example to get the median (as the set of values cannot be ordered).

Reference-level explanation

A possible way to add scales of measurement to the language is to create a dedicated aspect, for example:

type Relative_Temperature is digits 6 range -1_000_000.0 .. +1_000_000.0 with Measurement => Ratio_Scale;

type Absolute_Temperature_Celsius is new Relative_Temperature range -273.15 .. 1_000_000.0 with Measurement => Interval_Scale;

type Absolute_Temperature_Kelvin is new Relative_Temperature range 0.0 .. 1_000_273.15 with Measurement => Interval_Scale;

A derived type is needed because a subtype will not avoid assigning by mistake a relative temperature into an absolute temperature. It is possible to use physical units with these types (e.g. using GNAT-specific aspect Dimension_System), but only relative temperatures can be mixed with other magnitudes, it has no sense to multiply / divide / add / subtract an absolute temperature with other physical values. Note that even if Kelvin degrees has no arbitrary origin (e.g. has physical sense to multiply an absolute temperature in Kelvin by two as its origin is zero), the programmer in this case represented it with an Interval_Scale type to ensure the program doesn't mix by mistake absolute temperatures and relative temperatures.

By default, all numeric types have the Ratio_Scale (i.e. all relational and arithmetic operators allowed), but in a type with Interval_Scale is allowed to use relational operators but not arithmetical operators. To perform the few allowed arithmetical operations in variables with an interval scale, one possibility is to have dedicated attributes (as they require mixing relational and absolute variables, and it would be weird to operate / assign variables that are not subtypes).


-- Rel := Abs1 - Abs2;
Temp_Increase : Relative_Temperature := Absolute_Temperature_Celsius'Interval_Diff(Current_Abs_Temp, Previous_Abs_Temp);

-- Abs2 := Abs1 + Rel;
Max_Temp_Limit : Absolute_Temperature_Celsius := Absolute_Temperature_Celsius'Interval_Add(Current_Abs_Temp, Temp_Increase);

In this proposed syntax, the compiler must check that the type with an Interval_Scale has a parent type with a Ratio_Scale, and also that the 'Interval_Diff attribute is used only with a Interval_Scale type, receiving two values of that type, and the result has the type of the parent type (i.e. has a Ratio_Scale); and similarly for the 'Interval_Add attribute (but accepting first a parameter of that Interval_Scale type, and second a parameter of the associated parent type, i.e. with a Ratio_Scale).

In the case of enumerate types, by default the Measurement aspect is "Ordinal_Scale" (i.e. regular enum, all operations accepted). But if "Nominal_Scale" is used in an enumeration type, the compiler will reject using relational operators (except equality and inequality) and some attributes (like 'Max and 'Min). It is allowed to iterate in loops variables with Nominal_Scale (or to use 'Pred or 'Succ).

Note that it is possible to convert explicitly an interval / nominal type to a base type so the programmer can "escape" from the limitations, for example to compute the arithmetic mean value of a set of absolute temperatures it would be needed to convert them to a generic Real to add them and divide the result by the number of values (something not allowed with values of an Interval_Scale)

Finally, it would be important that generic functions / packages also support the Measurement aspect, to indicate which types are allowed when instantiating it (and which operations are allowed inside the generic implementation). For example:

generic
  -- Allowed any scalar type (can have Nominal_Scale, Ordinal_Scale, Interval_Scale, or Ratio_Scale)
  type T is (<>) with Measurement => Scale_Nominal;
  type T_Array is array(Positive range <>) of T;
function Mode(values : in T_Array) return T;

generic
  -- Allowed all scalar types except those with Nominal_Scale
  type T is (<>) with Measurement => Scale_Ordinal;
  type T_Array is array(Positive range <>) of T;
function Median(values : in T_Array) return T;

Motivation

Embedded software typically has to operate with absolute / relative quantities, as well as with enumerate values without any order. The Ada philosophy is to give the programmer tools to explicitly indicate which are the properties of each data type, so the compiler ensures at compilation time that the code has no typing mistakes.

And the scales of measurement are not only for physical quantities, for example it can be useful to define a Device_Address type of Interval_Scale (with parent type Byte_Offset of Ratio_Scale) for example to ensure addresses are not added together.

Caveats and alternatives

Currently the Ada standard library uses private types for absolute times and relative times, and a set of functions with all the allowed operators (in Ada.Calendar and Ada.Real_Time, e.g. function "+" (Left : Time; Right : Duration) return Time;). This ensures that the relative and absolute times cannot be operated incorrectly, but for the programmer it is a lot of effort to define all these functions (as well as error-prone, and with a code overhead). In addition, it is not possible that the standard library defines all possible absolute and relative types, as in some cases depends on the context (e.g. a color type can be unordered if primary colors, or ordered if following a specific colorimetry; or a company profit in dollars has an interval scale in case negative values accepted, or ratio scale if only zero or positive values accepted).