Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
(Adapted from my comment here at @peterdesmet's request!)
(Note that I would consider this proposal to be "low-priority" at the present moment, because it depends on the acceptance of the categorical field type in #875 )
Sometimes tabular data are produced in a "long" format that combines data of multiple different types into a single field / column. I see this form of data a lot in event-driven sensor data. For example:
measurementType
measurementValue
cloudiness
partly cloudy
cloudiness
cloudy
temperature
1
wind force
5
temperature
10
Where:
if measurementType = cloudiness then measurementValue:
Here, measurementValue is not a single type, but actually a union of three types: either a cloudiness measurement, a temperature measurement, or a wind force measurement, each with their own type definitions and constraints. More specifically, this is a tagged union aka discriminated union compound type, where type of measurementValue depends on the "tag" or "discriminator" found in measurmentType.
Tagged union types are a well-established, well-understood abstraction already implemented in many programming languages (e.g. python, rust, etc.) and semantic data parsing / validation libraries (e.g. python's pydantic; and typescript's zod).
Implementing this behavior as a tagged union field type would allow implementations to validate this type of field by parsing its underlying types. It could also perform exhaustiveness checks on the definition (ensure that all levels in the categorical measurementType had corresponding type definitions). It would also facilitate implementations pivoting into wider table formats, because the dependent type definitions would translate into the column types of the resulting wide columns.
Here's an example of how a tagged union field type might look like in frictionless (using the proposed categorical syntax in #875:
Note that the field-level validation on this type would ensure that all the levels of the measurementType categorical field were represented as keys of the match property in the measurementValue field. For example, if temperature wasn't defined as a key in the match property, this would trigger a validation error because temperature is one of the levels of the measurementType field, As mentioned earlier, this is a common feature of tagged union types.
If there is interest in this type, I can put together a more formal definition of the proposed union field's type signature (and RFC language).
(Adapted from my comment here at @peterdesmet's request!)
(Note that I would consider this proposal to be "low-priority" at the present moment, because it depends on the acceptance of the categorical field type in #875 )
Sometimes tabular data are produced in a "long" format that combines data of multiple different types into a single field / column. I see this form of data a lot in event-driven sensor data. For example:
Where:
if
measurementType = cloudiness
thenmeasurementValue
:type = categorical
categories = ["clear", "mostly clear", "partly cloudy", "mostly cloudy", "cloudy", "unknown"]
If
measurementType = temperature
thenmeasurementValue
:type = number
constraints.min = 0
constraints.max = 20
If
measurementType = wind force
thenmeasurementValue
:type = categorical
categories = [0, 1, 2, 3, 4, 5]
(Example adapted from @peterdesmet's work here)
Here,
measurementValue
is not a single type, but actually aunion
of three types: either acloudiness
measurement, atemperature
measurement, or awind force
measurement, each with their own type definitions and constraints. More specifically, this is atagged union
akadiscriminated union
compound type, where type ofmeasurementValue
depends on the "tag" or "discriminator" found inmeasurmentType
.Tagged union types are a well-established, well-understood abstraction already implemented in many programming languages (e.g. python, rust, etc.) and semantic data parsing / validation libraries (e.g. python's pydantic; and typescript's zod).
Implementing this behavior as a tagged union field type would allow implementations to validate this type of field by parsing its underlying types. It could also perform exhaustiveness checks on the definition (ensure that all levels in the categorical
measurementType
had corresponding type definitions). It would also facilitate implementationspivot
ing into wider table formats, because the dependent type definitions would translate into the column types of the resulting wide columns.Here's an example of how a tagged union field type might look like in frictionless (using the proposed categorical syntax in #875:
Note that the field-level validation on this type would ensure that all the levels of the
measurementType
categorical field were represented as keys of thematch
property in themeasurementValue
field. For example, iftemperature
wasn't defined as a key in thematch
property, this would trigger a validation error becausetemperature
is one of the levels of themeasurementType
field, As mentioned earlier, this is a common feature of tagged union types.If there is interest in this type, I can put together a more formal definition of the proposed
union
field's type signature (and RFC language).