MechanicalRabbit / FunSQL.jl

Julia library for compositional construction of SQL queries
https://mechanicalrabbit.github.io/FunSQL.jl
Other
144 stars 5 forks source link

define facet slots (e.g. hidden columns discoverable by listing facets) #65

Open clarkevans opened 1 month ago

clarkevans commented 1 month ago

In some cases there is good reason to have slots in a query frame that are available to be included, but would not want to be added by default. For example, https://github.com/MechanicalRabbit/FunSQL.jl/issues/61, would let a rowid column be available but not selected by default. There are other use cases for this, for example source detail as part of an ETL run, that isn't normally desired but on occasion may be helpful. The problem with this feature is that it hinders discoverability.

Perhaps we could use facet slots. Currently, if you have a query of the form, from(table).define(calc=>...).as(facet).define(a=>facet.a, b=>facet.b).undefine(a,b) you could construct a query with columns a and b from the table that would have a facet facet with a column calc. However, it'd be much more convenient to just write from(table).define(facet.calc=>...). In this case, facet wouldn't be selected by default, but one could either use facet.calc in computations, or could do define(facet.calc) or select(..., facet.calc) to include calc in the output columns. Overall, I think I prefer using facet based approach over having a flag on each column. At least facet could be somewhat discoverable, and we could make define(facet) be a synonym for define(x) for x in facet to lift all of the facet columns into the current. This way we also don't need a flag on each slot to indicate if it's visible or not. It'd also be nice to have a way to introspect a query frame to list all of the names (e.g. facets) that are available, e.g. joins, including ones that are only included in the SQL if they are actually used.

How could this address rowid? We could have a mssql facet that is provided on every table during TSQL introspection, with various calculations that are specific to TSQL based tables, such as rowid. This scales rather well, PostgreSQL might have pgsql.oid for example. Someone could try and normalize this across vendors, but that's another problem and not related to this specific feature request.