bitemyapp / esqueleto

New home of Esqueleto, please file issues so we can get things caught up!
BSD 3-Clause "New" or "Revised" License
370 stars 107 forks source link

Window functions and experimental syntax promotion #325

Closed belevy closed 1 year ago

belevy commented 1 year ago

Before submitting your PR, check that you've:

There are several notable changes here that should be addressed

  1. SqlSelect went through a major transformation

    • It was split into SqlSelectCols and SqlSelect; SqlSelectCols was made to remove the result argument from places where it was irrelevant, namely subquery results
    • SqlSelect had its functional dependencies weakened by removing the reverse r -> a fun dep. This change is useful to allow multiple SQL types to return the same value, though I do not believe that the changes in this PR are using that as of now. Ultimately it will be useful for limiting aggregations.
  2. Window functions were added in the Postgres modules, this was done primarily to avoid having to worry about any incompatibilities across db engines and to simplify the initial implementation. It is very possible that the window function modules can be used on MySQL 8 and SQLite as well but that is untested.

  3. The SqlExpr type was extended to use a context type variable, this is used primarily by the window function stuff but isn't exclusively useful there. One open question is whether there is a good way to transition the existing aggregation functions to the AggregateContext explicitly instead of leaving the type variable open to inference.

belevy commented 1 year ago

I'm unsure about adding the type parameter to SqlExpr. This feels like a big change that may not even cover the aggregate case. ...

-- but this one breaks:
SELECT sum(id), name FROM foo;

Is that true, would that not just return 1 for every row?

Can we accommodate that with SqlExpr_ ctx typ?

We can restrict the basic SqlSelect to not accept anything with an AggregateContext and have a different function selectAggregate that manages the aggregation validation.

On the other hand, we already have a bunch of annoyances with Value and Entity being subtly different (Value (Maybe a) vs Maybe (Entity a)). Adding Aggregate x and Window x will make this problem worse, especially since we really want to be able to compare across the types.

    having_ $ sum_ foo.age >=. val 100

We'd want val to be able to promote into SqlExpr (f a), which seems awful from a type inference perspective. Whereas right now it can just be SqlExpr_ cxt (Value a).

Having rediscovered exactly why I abandoned that approach after exploring it previously. The goal with the context variable was to make the breaking change as simple to upgrade to as possible (this includes not having to have a whole seperate hierarchy for aggregate values). For example the mercury web backend code only requires two updates to compile and those are uses of coerce which is already a known hole #305

belevy commented 1 year ago

Just merged and fixed the new record code to handle the changes to ToAliasReference. The biggest issue is that records with their aliase reference class derived will be unable to hold anything other than SqlExpr_ ValueContext. In general @parsonsmatt you didnt seem particularly enthused with the complications arising for the extra context variable.

I am unclear where we are at with regards to how to move forward.

As it stands we have two options.

  1. Leave the type system holes surrounding window functions
  2. Add major breaking changes to the implementation to fix the type system (albeit with most user facing code being untouched).
belevy commented 1 year ago

To clarify and expand on where the core issues are with regards to the type system.

Currently the type system makes no distinction between values that are pulled from tables, values that are provided as raw values to the query, values that are the result of aggregations and values that are uniquely identified by the GROUP BY clause.

Further, with the addition of window functions we have yet another type of value with restrictions that are limited differently from the others (i.e. you can't add a window to a value that was produced by a window function and the windowness of the value infects other values).

To patch this hole we need a way to tag values with their origins. Following the existing pattern would lead to newtype wrappers such as AggregateValue and WindowValue to mirror Value. However, we would need some sort of way for the different values to combine. Additionally an injected value should have some sort of polymorphic context since it logically can be used with all 3. Additionally we are left with more newtype wrappers that logically all have the same way of being extracted from the db and have no distinction in Haskell land just in SQL land.

Solution two which is what this PR is currently using adds a new type variable to the core expression type and turns SqlExpr into a type alias for SqlExpr_ ValueContext. This neatly solves the idea of a polymorphic context and converting between the context types can selectively allowed (i.e. lifting a value in to the window context, lifting a value into the aggregate context via grouping). This however has a major implication in that every existing use of SqlExpr is effected and any uses of internal functionality (i.e. abusing the Data.Coerce.Coercible instance of SqlExpr instead of using veryUnsafeCoerceSqlExprValue) can break. It is most definitely a very breaking change though the cost to upgrade should still be fairly cheap.

Additionally this change to the typesystem is not enough to fix aggregates but it is a necessary prerequisite so that we can distinguish between regular values and aggregate context values. To fix aggregates the SqlSelect instance would have to be limited to ValueContext and WindowContext values. An aggregate function actually should be able to generate a value context value in mysql 5.7 but in postgres and mysql 8 it looks like it always is an aggregate or a window function. This type of restriction is probably fine since the behavior in mysql 5.7 is not defined super well but in order to support the existing semantics we leave the aggregate functions as polymorphic over context, this would need to be restricted to the aggregate context (the window functionality implemented here is already forcing the function to select the Aggregate context via the (ctx ~ AggregateContext) constraint in the type class providing the over_ function.

There is a fine balancing act that were trying to play here, to patch the type holes necessarily means breaking existing code as we have to choose stricter semantics than can actually be handled in some dbs. At the same time, we want to provide a safe transition experience, this necessarily means that the old way and the new way need to coexist and some people will not realize that the old way is deemed unsafe or know to update to the new way. All that said I have an idea for how I would want to support safe aggregations but it would require a new select function(as the current one would seem to struggle mightly with enforcing this restriction) that seems out of context for this update.