bitemyapp / esqueleto

New home of Esqueleto, please file issues so we can get things caught up!
BSD 3-Clause "New" or "Revised" License
372 stars 107 forks source link

Implement Window Functions #229

Open belevy opened 3 years ago

belevy commented 3 years ago

This is my stream of consciousness thoughts on Window functions.

Window functions should be restricted in use. Therefore they cant be SqlExpr (Value a) since that would allow for their use wherever a Value is allowed. Solution: SqlExpr (AggregateValue a) This would use a new newtype to support SqlSelect. I am not a fan of the new wrapper type but it is forced on us by the fundeps of SqlSelect. I would much rather Value and AggregateValue disappeared when we go from the SqlExpr language to Haskell.

Currently we have a number of aggregate functions that can also be used with a window. These functions return types of SqlExpr (Value a). This seems wrong since you cant use aggregate functions in a where clause. Rather where_ expressions should only be in terms of SqlExpr (Value a) and having should be in terms of SqlExpr (AggregateValue a). This change would require work on generalizing the comparison operators.

Should there be two different count_(or other aggregate) functions or should we use a type class to overload the function. The typeclass would be.

class SqlCount a where
   count_ :: SqlExpr (Value v) -> a
instance SqlCount (SqlExpr (AggregateValue a)) where
   count_ = unsafeSqlFunction "COUNT"
instance SqlCount(WindowContext -> SqlExpr (AggregateValue a)) where
   count_ = unsafeWindowFunction "COUNT"

I am thinking about how to make the best SQL like syntax and my current idea is:

count_ (user ^. UserId) over_ (partitionBy (user ^. UserGroup) <> orderBy (user ^. UserName))

to achieve this we could have

data Over = Over
type WindowContext = Over -> Window
over_ = Over

optionally we can just use () instead of the new Over unit type. This is actually how the nice case syntax works `then = ()`

The window would be a Monoid

   data Window = Window
      { windowPartition :: Monoid.Last WindowPartition
      , windowOrder :: Monoid.Last OrderBy
      , windowFrame :: Monoid.Last WindowFrame
      }

so instead of writing

COUNT(*) OVER ()

you would have to write

countRows_ over_ mempty

Maybe we should make an AsWindow class

class AsWindow a where
   asWindow :: a -> Window
instance AsWindow Window where
   asWindow = id
instance AsWindow () where 
   asWindow = const mempty

which could be used to enable

countRows_ over_ ()

This update would require a change to the SqlExpr to create a new constructor

EAggregate :: (IdentInfo -> (TLB.Builder, [PersistValue])) -> SqlExpr (AggregateValue a)

This is theoretically not required but will increase the type safety without breaking backwards compat. Then unsafeWindowFunction would handle the logic of converting a Window into an EAggregate value.

Need to come up with a good way to generate WindowFrames. We don't need named windows since we have let statements.

Relevant Issues:

196

parsonsmatt commented 3 years ago

If over_ is instead an infix function, then we can keep countRows_ as-is and write:


countRows_
countRows_ `over_` someWindow
belevy commented 3 years ago

how does one forbid

 countRows_ `over_` someWindow `over_` anotherWindow

I guess over_ :: SqlExpr (Value a) -> Window -> SqlExpr (AggregateValue a)?

EDIT: This is significantly more permissive

val 5 `over_` aWindow
belevy commented 3 years ago

Current working approach is on https://github.com/bitemyapp/esqueleto/tree/4.x The changes are based on https://github.com/bitemyapp/esqueleto/pull/228

First insight, aggregate expressions are much more limited than regular expressions. Additionally there is a distinction between kinds of aggregate expressions, (i.e. aggregate function vs value that has been grouped).

This leads to the creation of a new expression type

newtype SqlAggregate s a = SqlAggregate { unSqlAggregate :: SqlExpr a }

where s is the source for that aggregate. All standard aggregation functions will remain universally quantified over s but values created by groupBy (#185) are tagged with a GroupedValue source.

With the relaxation of the constraints on SqlSelect a r | a -> r we can create an instance SqlSelect (SqlAggregate a) a that defers to SqlExpr a

Window functions are even less flexible than regular aggregation functions as they require a window, as such we create a new expression type

newtype WindowExpr a = WindowExpr { unWindowExpr :: SqlExpr a }

With this in place we can then define an Over_ type class that abstracts over the expression type

data WindowAggregate
class Over expr where
    over_ :: RenderWindow window => expr a -> window -> SqlAggregate WindowAggregate a

and then we can define our instances for WindowExpr (details elided) as well as an instance for the universally quantified SqlAggregate

data NoWindow
deriving via WindowExpr instance (s ~ NoWindow) => Over (SqlAggregate s)

The issue that remains is that SqlAggregate and SqlExpr are different types but share many operations, for example it might make sense to say SUM(field) > 1 in a having clause. To Support this we need to create type classes for having as well as the comparison operators that abstracts over the expression type. This has been done for ^., ?., and having so far

friedbrice commented 2 years ago

Should there be two different count_(or other aggregate) functions or should we use a type class to overload the function. The typeclass would be.

Naw. The aggregating functions return a SqlExpr (AggregateValue a) (and that's the only way to construct an AggregateValue), and there's a function unAgg :: SqlExpr (AggregateValue a) -> SqlExpr (Value a). Apply unAgg to things so that you can use them in where_ clauses.

Lines will be noisy, but I think this will work.

Edit: I think you're not supposed to use WHERE and OVER together, though...

parsonsmatt commented 2 years ago

I believe you can put a predicate on an OVER with HAVING

lfborjas commented 2 years ago

I ran into a neat use case for a predicate on aggregates as window functions, as permitted by Postgres at least; only works for aggregates (not other window functions like lag, rank, etc.) but will constrain the window to only rows that satisfy the filter; not sure if this addresses what Daniel was mentioning but I reckon it's worth preserving:

select created_at, amount, 
       count (*) over () as all_txns,
       -- transactions can have other statuses, but we only want to sum the approved ones 
       sum(amount) filter (where status = 'Approved') over () as total_approved
from my_transactions_table 
where created_at >= now () - interval '1 month';