GlareDB / glaredb

GlareDB: An analytics DBMS for distributed data
https://glaredb.com
GNU Affero General Public License v3.0
550 stars 36 forks source link

support sized list syntax: `int[3]` #2679

Open universalmind303 opened 3 months ago

universalmind303 commented 3 months ago

Description

Fixed size lists have second class support for most everything in our current sql syntax. I propose we expose some syntax for working with fixed size lists as first class citizens.

Looking to programming languages as guidance, I think the go style syntax would be the most appropriate. SQL standard is already <datatype>[] with [] being the list qualifier. It seems natural to add a size in there to declare a fixed size list. <datatype>[<size>]

Create table

create table fsl (v1 double[3]);

Casting

select v1::float[3] from fsl;
tychoish commented 3 months ago

should we also include some related way to create other (non-fixed-size) lists? Like <datatype>[:] or <datatype>[...]

universalmind303 commented 3 months ago

@tychoish we already support this.

create table list_type (v1 double[]);

select v1::float[] from list_type;
universalmind303 commented 3 months ago

I think we'd also want to expose a function for creating the lists directly instead of casting.

we have ways of doing this for lists.

# the array literal syntax
select [1,2,3];
select make_array(1, 2, 3); 

It'd be nice to have a fixed size equivalent too.

some potential ideas

select make_sized_array(1, 2, 3); # FixedSizeList(Int, 3)
select fsl([1,2,3], 3);
select make_sized_array([1,2,3], 3)

not sure what a literal syntax for that would look like though?

select [1,2,3][3]; # similar to the current syntax
select [1,2,3; 3]; # rust style
select {1,2,3}; # ?? 
tychoish commented 3 months ago

Is there a way to use make_list (or array) for both but have the size be optional, like the make function in go?

universalmind303 commented 3 months ago

Is there a way to use make_list (or array) for both but have the size be optional, like the make function in go?

Not very easily. The make_list function is provided by datafusion. We'd likely have to expose our own function for this, or try to upstream some changes to DF. I also don't think exprs can handle types as inputs.

tychoish commented 3 months ago

I also don't think exprs can handle types as inputs.

Isn't that a problem regardless?

I think having a wrapper make_list or make_array that just takes an extra (optional) argument and otherwise passes off to the DF function wouldn't be terribly bad, but that might be harder than we want.

I think I'd go for [size][item,item,item] as a literal (non-function appearing) syntax for fixed size lists.

universalmind303 commented 3 months ago

Isn't that a problem regardless?

To not conflate things, there are two distinct paths, one for the type system. int[3], and one for the expr syntax (literal or function) ([1,2,3], make_array(1,2,3))

For the type system, this'll regardless have to be done before datafusion similar to how we handle other custom sql syntax.

For the exprs, it's a bit different as we can just create a function that plugs directly into datafusion a lot easier than it is to do the former. So something like make_sized_array would be much lower effort than modifying make_array before datafusion.