Closed vpetrovykh closed 1 year ago
A bunch of questions in no particular order:
set of range<x>
is fine. We can have range_canonicalize
function to sort and unionize sub-ranges. What are very common use cases for multirange?..
and ..=
operators. No parenthesis for ranges. Seems to work fine, why we need parenthesis?range_unpack
will behave on unbounded ranges? Does it crash? Does it crash only when lower bound is unbounded (since it should be possible to produce unbounded sequence in SQL, right?)0..10
is much better than =0..10
or =0..=9
..name between "aaa".."aac"
works nicely as well as .name between <range<str>>$0
(although between
isn't great operator for that) A bunch of questions in no particular order:
- Are multi-ranges needed? I think
set of range<x>
is fine. We can haverange_canonicalize
function to sort and unionize sub-ranges. What are very common use cases for multirange?
Multirange is a non-contiguous interval. A very simple use case: hours of operation of a shop. We want a scalar type and not a set of range<x>
because we want to be able to use Postgres' multirange
type. Important for performance.
- Rust has
..
and..=
operators. No parenthesis for ranges. Seems to work fine, why we need parenthesis?
I think we decided to postpone the decision on range literals and only add function constructors initially.
- How
range_unpack
will behave on unbounded ranges? Does it crash? Does it crash only when lower bound is unbounded (since it should be possible to produce unbounded sequence in SQL, right?)
Unbounded range_unpack
will raise an error.
- Rust has ranges for any orderable type (which are all of them in EdgeDB, right?). So in our terms it means that
.name between "aaa".."aac"
works nicely as well as.name between <range<str>>$0
(althoughbetween
isn't great operator for that)
No, in EdgeDB we'll have an anypoint
type which represents data types that support vectors, which is numerics and dates.
- Rust has ranges for any orderable type (which are all of them in EdgeDB, right?). So in our terms it means that
.name between "aaa".."aac"
works nicely as well as.name between <range<str>>$0
(althoughbetween
isn't great operator for that)No, in EdgeDB we'll have an
anypoint
type which represents data types that support vectors, which is numerics and dates.
Sorry, I don't understand this answer. Will EdgeDB support range<anypoint>
? I'm not sure range<vector>
makes sense, but range<str>
or range<tuple<str, int32>>
can easily be used for various queries, like pagination.
Range types
Range types can exist as a sort of a scalar value, akin to arrays. Although they represent a set of values, but the way we manipulate these values is different from a pure EdgeQL set. It's a special case that is a little more structured.
Other programming languages seem to mostly use range types to loop over indexes. This biases the way ranges are defined. In EdgeDB range types are expected to be used in constraints or even as values unto themselves.
Base types
It seems like we need the following abstract types:
anypoint
- any scalar that has "points" which can be used to construct a range (likeint64
andfloat64
, but not, say,json
andstr
)anydiscrete extending anypoint
- any scalar that conceptually represents discrete values (likeint64
ordate
)anycontiguous extending anypoint
- any scalar that conceptually represents contiguous values (likefloat64
ordatetime
)range<sometype>
wheresometype
is a concrete type extendinganypoint
.Boundaries
A range value should have an upper and lower bound and an indicator for each of them whether they are included in the values the range represents.
It should be possible to omit one or both of the boundaries to represent an open-ended interval. The
{}
seems like a reasonable representation of such an omitted boundary. It should be an error to omit the boundary while indicating that it is inclusive.Constructors
One option is to have a special literal to construct a range. A particularly neat format can draw inspiration from the way ranges are indicated in math by using round and square brackets to indicate boundary inclusivity. E.g.
[5..10)
or(..-10]
. The upside is that this format is familiar from math. The downside is that this messes with any parentheses matching processing (auto-complete, highlighting, etc.)Another option is to have matching parentheses and use an extra symbol to indicate inclusivity. E.g.
(=5..10)
or(..=-10)
. It seems that exclusive boundary is the more natural default for this approach, largely because it works naturally with omitted boundaries. The upside of this format is that it plays nice with parentheses matching. The downside is that it's hard to search for and it's not necessarily intuitive when seeing it.Finally, we can have a function constructor. To make things succinct the signature should be:
Basically, we want to be able to use positional arguments for brevity. We also want the common case of using ranges for some kind of iteration to only need the boundaries and have the inclusivity match how we use indexes in slicing.
The constructor function is compatible with also having a dedicated literal, so it can be implemented first.
Accessing values
We can have functions to access the 4 different components of a range value. E.g.
get_range_component(range_val, 'lower')
or 4 dedicated functionsget_lower_bound(range_val)
.Alternatively, we can have
.
accessors directly on the range values. E.g.range_val.upper
orrange_val.includes_lower
.We can have
[]
accessors directly on the range values. E.g.range_val['upper']
orrange_val['includes_lower']
.Perhaps instead we can have a converter function that takes a range and produces a named tuple (need to deal with
{}
boundaries) or it can producejson
.Multi-ranges
Performing operations with ranges should include ways of combining ranges into ranges with gaps. Basically, it's like having a set of ranges, but a bit more specialized than basic sets for the same reasons that ranges need to be specialized. We call these multi-ranges.
Base type
It seems that the type for multi-range would be something like:
multirange<sometype>
wheresometype
is a concrete type extendinganypoint
.Constructor
A multi-range could be constructed using a function that takes a set of basic
range
values:It is probably important to normalize the resulting multi-ranges to always be composed of non-overlapping ranges regardless of whether the set of input ranges overlaps.
Interoperability
Many of the range operations should also work on combination of range and multi-range as well as multi-ranges. It is desirable to have an implicit cast from
range
tomultirange
and probably an assignment cast formultirange
torange
.Accessing sub-ranges
It may be useful to organize individual sub-ranges in a multi-range like an array. This way we can access them by indexing and even use
len
to determine how many sub-range pieces there are in a multi-range. If we break down every multi-range into non-overlapping sub-ranges then we can always order them in a consistent way by their boundaries and use that ordering for index values.Alternatively we can have a function
get_subranges
that produces an array of range components of a multi-range ordered by a choice of several properties like "boundary" and "range size". This would be useful if more than one way of ordering sub-ranges is useful in practice.Usages
Both ranges and multi-ranges can be used to determine constraints.
Both ranges and multi-ranges can be used as properties directly and potentially those properties can be indexed to speed up filters, etc.
Both ranges and multi-ranges can be expanded as a set of values to be potentially iterated over: