circuithub / rel8

Hey! Hey! Can u rel8?
https://rel8.readthedocs.io
Other
150 stars 38 forks source link

Statements overhaul (support for statement-level `WITH`) #250

Closed shane-circuithub closed 12 months ago

shane-circuithub commented 12 months ago

The motivation behind this PR is to add support for PostreSQL's WITH syntax at the statement level, which gives the ability to, e.g., delete some rows from a table and then re-insert those deleted rows into another table, without any round-trips between the application and the database.

To support this, this PR introduces a new type called Statement, which represents a single PostgreSQL statement. It has a Monad instance which allows sub-statements (such as DELETE and INSERT statements) to be composed together and their results bound to values that can be referenced in subsequent sub-statements. These "compound" statements are then rendered as a WITH statement.

select, insert, update and delete have all been altered to produce the Statement type described above instead of the Hasql.Statement type.

Some changes were necessary to the Returning type. Returning previously bundled two different concepts together: whether or not to generate a RETURNING clause in the SQL for a manipulation statement, and how to decode the returned rows (if any). It was necessary to break these concepts apart because with WITH we need the ability to generate manipulation statements with RETURNING clauses that are never actually decoded at all (the results just get passed to the next statement without touching →

Now, the Returning type is only concerned with whether or not to generate a RETURNING clause, and the question of how to decode the returned the result of the statement is handled by the run functions. run converts a Statement into a runnable Hasql.Statement, decoding the result of the statement as a list of rows. The other variations, run_, runN, run1, runMaybe and runVector can be used when you want to decode as something other than a list of rows.

This also gains us support for decoding the result of a query directly to a Vector for the first time, which brings a performance improvement over lists for those who need it.

shane-circuithub commented 12 months ago

@ocharles This is what I had in mind when I first mentioned turning Statement into a proper Monad. It turns out it is possible to go all the way. It's interesting in that it seems to converge in some ways with your stab at this — we both end up with the exact same Returning type for example (that contrasts between Query a and () instead of Just a and Nothing).

I do insist on Rows though. If you don't like the name then I'm happy to call it something else if you have a suggestion, but I want something like that before we do this. If you insist that it's "orthogonal" to this then I'll make a PR where I add that first and then rebase this on top of that, but that feels pedantic.

I don't like that in your approach we end up with three versions of everything (insert, insert_ and withInsert) where one (insert) would suffice. And there are things you can express with run and insert that you can't express with either of your three versions. And even if you want to ignore the Single, Maybe or Vector constructors of Rows for now, it still bothers me that there is no way in your API to not-care about the result of an INSERT. Before now we allowed somebody to write returning = pure () when they don't care about the number of rows affected (which is the vast majority of the time). This had to change here to decouple the statement from the decoding of the result of that statement, but I think it's important to continue to provide a way for a user to indicate that they don't care about the result or the number of rows returned. With your insert_, you need to wrap the whole statement in void $ to suppress warnings (even though the _ suffix usually indicates returning ()). Rows gives an explicit way to not care about the number of rows returned with Void.

Also note that your insert_ function doesn't do anything to protect against somebody supplying an Insert with a Returning clause. Hasql's rowsAffected decoder will fail there because it's expecting to see the number of rows affected but instead it's encountering the rows themselves.

Also, your API doesn't provide a way to get the number of rows affected by an INSERT that comes under a WITH. For example, WITH rows AS (SELECT * FROM foo) INSERT INTO bar (SELECT * FROM rows) is perfectly valid, and in psql it will return the number of rows affected, but it's not possible to construct that in your API.

But this does address your other concerns in your comment on the first PR — there's no IsStatement typeclass anymore, run is not polymorphic, and Returning doesn't use a kind-level Maybe.

ocharles commented 12 months ago

First, let me quickly reply to some of your comments.

Also note that your insert_ function doesn't do anything to protect against somebody supplying an Insert with a Returning clause. Hasql's rowsAffected decoder will fail there because it's expecting to see the number of rows affected but instead it's encountering the rows themselves.

I did consider this but actually thought that it would work, but didn't test it. I was kind of hoping that the internal rowsAffected decoder was reading some different information rather than the literal query results, but I never tested that. If that's true, that's a much more significant drawback as I definitely don't want to introduce runtime errors.

Also, your API doesn't provide a way to get the number of rows affected by an INSERT that comes under a WITH.

Not directly, but people could just countRows if they wanted this.

For example, WITH rows AS (SELECT FROM foo) INSERT INTO bar (SELECT FROM rows) is perfectly valid, and in psql it will return the number of rows affected, but it's not possible to construct that in your API.

There's another slightly different point here. I again considered this, but considered it a pretty fringe scenario. Your example is better written with just INSERT (no need for WITH). The more realistic one is something like WITH rows AS (DELETE FROM .. RETURNING) INSERT INTO ..., but my compromise there was to just chain a SELECT on at the end (as PostgreSQL guarantees all modifying queries will be ran to completion regardless of if they are referenced).

In summary, your comments about my last approach all valid, though I did consider them and basically considered them an acceptable trade-off. But they are a trade-off, and as your PR shows, we don't have to make that trade off.


Now, on to this PR. Thanks for spending a bit more time on it. I like this approach a lot more. A legitimate Monad Statement is wonderful, the loss of type-level Maybe is much nicer, and I'm also happy you were OK losing the polymorphism. The only thing I still don't quite like is Rows. Not the concept, but the constructor itself. My only remaining proposal would be to change:

data Rows = Void | ..
run :: Rows a -> ...

into

run_ :: Statement exprs -> Hasql.Statement () ()
runN :: Statement () -> Hasql.Statement () Int64
runOne :: Serializable exprs a => Statement (Query exprs) -> Hasql.Statement () a
runMaybe :: Serializable exprs a => Statement (Query exprs) -> Hasql.Statement () (Maybe a)
run :: Serializable exprs a => Statement (Query exprs) -> Hasql.Statement () [a]
runVector :: Serializable exprs a => Statement (Query exprs) -> Hasql.Statement () (Vector a)

(We could also lose the uniform run prefix and maybe go with execute_, execute for run_ and runN - maybe there are better names for the remaining four).

What do you think about this? I suggest this because I still think the most common use-case of Rel8 is a single select, and I would much prefer to write

run $ select do
  ...

over

run List $ select do
  ...

Minor, but a tiny bit more convenient. I don't really see anything else that a user would care about Rows.

Other than deciding on this last point, I think we're all good!

shane-circuithub commented 12 months ago

Yeah, I can live with hiding the explicit Rows type and just exporting different versions of run instead.