CategoricalData / hydra

Transformations transformed
Apache License 2.0
71 stars 9 forks source link

Parameterize literal types #129

Open joshsh opened 1 month ago

joshsh commented 1 month ago

Once upon a time, Dragon had a set of built-in literal types, the numeric types of which were parameterized by bit precision and signedness. From the very beginning of Hydra (https://bit.ly/hydra-design-doc), we have had instead of parameterized literal types, a simple nested enumeration of literal types, e.g. with built in bigfloat, float32, float64, bigint, int8, ... int64, uint8, ..., uint64. This enumeration hasn't changed since the doc was written. IIRC there was a good reason for choosing this design, but I didn't record the reason and can't recall it now. Most likely, the parameterized types in Dragon had been inconvenient in some way. However, there is now pressure to add even more literal types, including decimal numbers of varying bit precision, and I think the enums are going to become unwieldy. Instead, I propose to investigate parameterized literal types once again, in a branch, and see how they work out. The literal value grammar will likely break symmetry with the literal type grammar for the first time.

joshsh commented 1 month ago

Actually, I did describe the reasons in the design doc; parameters were dropped for the sake of inference:

No parameterized primitive types: Hydra does not currently parameterize primitive types like Dragon does. For example, there is no "precision" parameter for integer or floating-point types, no "signedness" parameter for integer types, and no "maximumLength" parameter for string types. In Hydra, the complete type of an atomic value can be inferred from the value itself, whereas Dragon's primitive type cannot. In Hydra we relieve some of the need for parameters by providing additional value constructors like int16, int32, uint64, float32, float64, etc. as well as bigint and bigfloat types. Dependent types like integers with bounds, strings with bounded length regex, etc. are possible using metadata, but are not part of the Hydra Core type system.

That's a valid reason, but removing the parameters is not the only solution; another solution is simply to include literal types within literal value representations, so the type does not need to be inferred. The more verbose values shouldn't be a problem for typical applications, in which Hydra Core is used as an intermediate representation, not as a representation for data exchange. TBD what the best compromise is between verbosity of data vs. schema representations.