frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
481 stars 109 forks source link

Exclusive minimum and maximum values are not supported as numeric constraints #856

Closed rjgladish closed 5 months ago

rjgladish commented 6 months ago

In version 1 of Frictionless Tabular Schema , numeric constraints are limited to closed intervals, i.e. [,], [minimum,], [,maximum], or [minimum, maximum].

However, some observational measurement limits are best expressed as minimum and maximum values that approach a limit. E.g. minimum temperature measurements should be greater than absolute zero (0,] Kelvin, (−273.15,] °C, (−459.67,] °F

Absent closed interval support, limits must be approximated using arbitrary finite precision, such as 0.000000000001 K, -273.149999999999 °C, and so on. Also, Frictionless min/max constraints that corresponding to SQL "greater than" or "less than" checks get messy. Although this workaround is mostly adequate for integer min/mix, it's messy.

The addition of two optional constraint properties to express open minimum and maximum intervals with exclusive values with permit schema to describe closed interval constraint conditions [minimumEx,], [,maximumEx], and [minimumEx,maximumEx], and half-open intervals, e.g. (minimum, maximumEx], [minimumEx, maximum)

Since it would be optional, it would minimally impact older software, but it would require a new schema. On the negative side, software that is unaware of exclusive constraints only look for inclusive constraints.

peterdesmet commented 6 months ago

I had a similar use case, where I want to indicate that the width of a bounding box can be close 0, but not 0. As you describe, I solved this by providing a very small number with an arbitrary fine precision:

"minimum": 1e-15 # same as 0.000000000000001

While two new properties minimumEx and maximumEx allow to express that better, I fear it might lead to even less validation (due to lack of software support). So I don't prefer it over the above "hack".

rjgladish commented 6 months ago

Thanks for the feedback. The approach you suggest will work, and realistically, it is the only reasonable interoperable approach using V1 schema, but it may introduce implementation-specific rounding behaviors. IFF the precision of min/max values are greater than the precision supported in the underlying validation implementation, rounding minimum and maximum, down to 0 and up, respectively, may introduce unexpected behaviors that can be difficult to debug

Perhaps this can be revisited in V2, which was recently opened for discussion?

Because the addition of exclusiveMinimum and exclusiveMaximum as constraints would create better alignment with OAS (OpenAPI) V2.0, V3.0 schema e.g. https://github.com/OAI/OpenAPI-Specification/blob/main/schemas/v3.0/schema.json, and could leverage and json-schema.org Draft 4 http://json-schema.org/draft-04/schema#/properties/exclusiveMinimum and .../exclusiveMaximum, and thr issues raised above, I think it may be worth another look in the next MAJOR schema revision.

rjgladish commented 5 months ago

To clarify my earlier comments Dec 11 and Dec 23.

JSON Draft 4 and OAS Schema properties exclusiveMinimum and exclusiveMaximum are Boolean types. A default value of FALSE indicates inclusive minimum and inclusive maximum, respectively, preserves the V1 interpretation. In contrast, A value of TRUE specifies that minimum and maximum are interpreted as exclusive minimum and inclusive maximum, respectively.

peterdesmet commented 5 months ago

@roll the accepted https://github.com/frictionlessdata/datapackage/pull/11 implementation adds exclusiveMinimum and exclusiveMaximum as constraints that behave exactly like minimum and maximum (but excluding the actual value).

I personally think this is a good implementation, but it differs from what @rjgladish proposes in https://github.com/frictionlessdata/specs/issues/856#issuecomment-1904161419, where those fields are boolean types that affect the behaviour of minimum and maximum. It would be good to express why the former implementation was chosen.

roll commented 5 months ago

@peterdesmet JSON Schema has fixed draft-4 behavior - https://json-schema.org/understanding-json-schema/reference/numeric#range

In JSON Schema Draft 4, exclusiveMinimum and exclusiveMaximum work differently. There they are boolean values, that indicate whether minimum and maximum are exclusive of the value.

So basically we just stick to the current JSON Schema version (as Table Schema inherits a lot from there)