Feature request: Expose an endpoint that returns the sql query generated instead of the result of the query

viiicky commented 4 years ago

Environment

PostgreSQL version: PostgreSQL 9.5.10
PostgREST version: postgrest/postgrest:v6.0.2

Feature request

We use the above-mentioned postgrest docker image with no problems.

Now, we need to write a new web service. We have decided not to use PostgREST in this new service because most of the SQL generated by this service will be of complex nature, including lots of CTEs, group-bys, aggregate functions etc. We, however, wish to keep the API contract of this service similar to that of PostgREST.

We are thinking to reuse the "filters" part of the SQL generated by PostgREST somehow. That way, we can keep the contract for query params of APIs of this new service same as PostgREST. Thus, if PostgREST exposes an endpoint, where I can make a request with these query params and get the SQL generated, I can build my complex SQL on top of this SQL without worrying about the "where" part of the SQL.

For example, below request to PostgREST: GET /query/people?amount=gte.1000&bill=eq.true&date=gte.2020-06-30T18:30:00.000Z&state=in.(DONE)&everified=eq.true&limit=10&offset=0&or=(status.ilike.*ACTIVE*)&total_amount=gte.2000 HTTP/1.1 should return: select * from people where amount >= 1000 and bill is true and date >= '2020-06-30T18:30:00.000Z' and state in ('DONE') and verified is true and (status ilike '%ACTIVE%') and total_amount >= 2000 offset 0 limit 10

I am going to go through the code, and based on my findings, will put my suggestions here on how to go about it. Right now struggling to setup Haskell dev setup - never worked on it.

Any directions on which part of the code I should be looking at would be helpful. Thanks.

wolfgangwalther commented 4 years ago

I'm not sure whether your approach is really the best way to go about this, but personally I would love to have this for debugging purposes. If I could easily have the query returned instead of executed, that might help development a lot. One way to do this would be, by just making the same request with a different Accept header. So instead of Accept: application/json, you could do something like Accept: application/sql and get the query-to-be-run.

Now, not really answering your question: Have you considered putting those CTEs, GROUP BYs and aggregates into VIEWs or FUNCTIONs and then query those through PostgREST? This way you would still get the PostgREST filters for free. What kind of problems / limitations would you face when doing it this way?

ruslantalpa commented 4 years ago

@wolfgangwalther for debugging (locally) just log all queries and look at the database logs, no need for special endpoints for this

wolfgangwalther commented 4 years ago

@wolfgangwalther for debugging (locally) just log all queries and look at the database logs, no need for special endpoints for this

Sure, that's what I am doing right now. But sometimes you would first need to create that local environment to replicate. If I face something weird in production (or on a server where I don't have access to the database logs...), instead of manually replicating the environment and issue first, I could immediately repeat the same query with the other header and get the query. I guess this could also improve bug reports, if people can easily provide the executed query in the issue as well.

viiicky commented 4 years ago

Have you considered putting those CTEs, GROUP BYs and aggregates into VIEWs or FUNCTIONs and then query those through PostgREST? This way you would still get the PostgREST filters for free. What kind of problems / limitations would you face when doing it this way?

Hmmm, in order to keep the question concise and to the point, I simplified it too much @wolfgangwalther . The actual use-case is much complex, and the view approach would not work out. In a nutshell, we have to create dynamic SQL based on user input. It is this dynamic SQL which I was talking about in my earlier response that becomes complex because of group bys, aggregates, ctes etc. What cte(s) to create, what dimensions to have in group bys, what measures to aggregate - everything gets decided based on client request.

Good news is, have written all of this already, the only part left is the "where" clause, which instead of writing on my own, I am hoping to re-use from this library.

viiicky commented 4 years ago

Also @wolfgangwalther , do you have any inputs on which place to look into for going the Accept header way? With the help of @ruslantalpa , I have already managed to get the query out, but I am now more inclined to identify such request based on the header instead of the prefix query in the path.

I will keep this thread updated if I find something from my side before you reply back. Also, I am new to Haskell, so please excuse any wrong terminologies used by me. Thanks.

wolfgangwalther commented 4 years ago

Hmmm, in order to keep the question concise and to the point, I simplified it too much @wolfgangwalther . The actual use-case is much complex, and the view approach would not work out. In a nutshell, we have to create dynamic SQL based on user input. It is this dynamic SQL which I was talking about in my earlier response that becomes complex because of group bys, aggregates, ctes etc. What cte(s) to create, what dimensions to have in group bys, what measures to aggregate - everything gets decided based on client request.

Good news is, have written all of this already, the only part left is the "where" clause, which instead of writing on my own, I am hoping to re-use from this library.

I guessed you were going to say something like that. In theory, that's nothing (yet) that wouldn't be possible with an RPC/FUNCTION as well. Of course, this would be limited, if:

you would need to have those filters on the source table and not on the result (pushing down those conditions into the function is not possible)
you don't know the possibly returned columns in advance (you could return JSON from the function, but couldn't use filtering properly anymore)

Still, there's one fundamental problem I see: If we were to introduce either a query endpoint or the same thing through Accept headers, I think it would be reasonable to expect that this request should throw errors for requests, where you e.g. name some columns to filter on that don't exist. It wouldn't make much sense for me, if those requests would then return the wrong SQL query without validating it at all (e.g. the explain query could still be run, so that postgres still parses the query). Now if the query is parsed indeed - you will need to create a table/view (endpoint) that has all the possible columns names you might want to filter on, to be able to get those queries you are looking for. Not sure whether that's possible in your case.

wolfgangwalther commented 4 years ago

Also @wolfgangwalther , do you have any inputs on which place to look into for going the Accept header way? With the help of @ruslantalpa , I have already managed to get the query out, but I am now more inclined to identify such request based on the header instead of the prefix query in the path.

I'm assuming you're getting the query somewhere around here?

https://github.com/PostgREST/postgrest/blob/bd2160db26210d41821825925b7149bf30566069/src/PostgREST/App.hs#L137

It's already stm and contentType on the same line, so that should be good starting point for both, I guess.

I will keep this thread updated if I find something from my side before you reply back. Also, I am new to Haskell, so please excuse any wrong terminologies used by me. Thanks.

I am new as well :). @steve-chavez will be able to help you much better.

steve-chavez commented 4 years ago

@viiicky Have you tried creating a stored procedure that does dynamic SQL?

Here's an example: https://github.com/PostgREST/postgrest/issues/915#issuecomment-553188824. With that you can reuse all of the PostgREST filters(including the WHERE part),.

steve-chavez commented 4 years ago

For example, below request to PostgREST: GET /query/people?amount=gte.1000&bill=eq.true&date=gte.2020-06-30T18:30:00.000Z&state=in.(DONE)&everified=eq.true&limit=10&offset=0&or=(status.ilike.ACTIVE)&total_amount=gte.2000 HTTP/1.1 should return: select * from people where amount >= 1000 and bill is true and date >= '2020-06-30T18:30:00.000Z' and state in ('DONE') and verified is true and (status ilike '%ACTIVE%') and total_amount >= 2000 offset 0 limit 10

Still, there's one fundamental problem I see: If we were to introduce either a query endpoint or the same thing through Accept headers,

One problem with this direction, the query structure might change later when we parametrize the values. The WHERE part could change to: where amount >= $1 and bill is $2 and date >= $3 and state in ($4) and total_amount >= $5. I'm guessing that would not be helpful for reusing the query in another client unless we somehow expose the parametrized values.

viiicky commented 4 years ago

@wolfgangwalther @steve-chavez yes, I have written a bunch of postgres function only - that helps first create the dynamic SQL and then execute it. I cannot use the /rpc option by PostgRest though as it requires POST requests.

viiicky commented 4 years ago

btw, I just finished up my requirement. This unblocks me! Here is the diff: https://github.com/PostgREST/postgrest/compare/master...viiicky:master in case you guys are interested @ruslantalpa @wolfgangwalther @steve-chavez

Let me know if you guys have any comments. Also, let me know if you think this feature can make it to upstream, in which case, happy to contribute after fixing comments, if any.

This is how it works: Usual requests, nothing changes:

% curl http://localhost:3000/todos -i                                                                   
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Date: Thu, 20 Aug 2020 06:27:37 GMT
Server: postgrest/7.0.1 (dfa1fca)
Content-Type: application/json; charset=utf-8
Content-Range: 0-1/*
Content-Location: /todos

[{"id":1,"done":false,"task":"finish tutorial 0","due":null}, 
 {"id":2,"done":false,"task":"pat self on back","due":null}]%

Same request with the new header:

% curl http://localhost:3000/todos -i -H "Accept: application/sql"
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Date: Thu, 20 Aug 2020 06:27:45 GMT
Server: postgrest/7.0.1 (dfa1fca)
Content-Type: application/sql; charset=utf-8

SELECT "api"."todos".* FROM "api"."todos"    %

Have of course added this header support for just ActionRead kind of Action.

viiicky commented 4 years ago

I think it would be reasonable to expect that this request should throw errors for requests, where you e.g. name some columns to filter on that don't exist. It wouldn't make much sense for me, if those requests would then return the wrong SQL query without validating it at all

@wolfgangwalther I did not solve for this problem, as this is not super-worrying for my use-case.

viiicky commented 4 years ago

I'm assuming you're getting the query somewhere around here?

https://github.com/PostgREST/postgrest/blob/bd2160db26210d41821825925b7149bf30566069/src/PostgREST/App.hs#L137

It's already stm and contentType on the same line, so that should be good starting point for both, I guess.

This was helpful @wolfgangwalther . Thanks for the direction.

viiicky commented 4 years ago

One problem with this direction, the query structure might change later when we parametrize the values. The WHERE part could change to: where amount >= $1 and bill is $2 and date >= $3 and state in ($4) and total_amount >= $5. I'm guessing that would not be helpful for reusing the query in another client unless we somehow expose the parametrized values.

Yes, parameterized query isn't going to be helpful. But, now that, I understand the area, and how it is working, I am good to go. Can always write my own stuff.

viiicky commented 4 years ago

Also, since my issue is resolved, let me know if I should be closing this issue now. Thank you all /\

wolfgangwalther commented 4 years ago

@wolfgangwalther @steve-chavez yes, I have written a bunch of postgres function only - that helps first create the dynamic SQL and then execute it. I cannot use the /rpc option by PostgRest though as it requires POST requests.

That's not true. As long as you're not mutating the database, you can call it with GET: http://postgrest.org/en/latest/api.html#immutable-and-stable-functions

steve-chavez commented 4 years ago

Hmm.. I still think that the use case could have been solved by dynamic SQL + func(as Wolfgang mentions, it also works with GET).

That being said, besides the SQL query, sometime ago I had the request to expose the metadata for how pgrst does the JOINs on resource embedding(see Embedding disambiguation), this currently only gets exposed when there's an error.

So for now I'm not sure if it's worth to expose the SQL or other information.

Also wanted to mention that application/sql is indeed a IANA media type: https://tools.ietf.org/html/rfc6922.

wolfgangwalther commented 4 years ago

Also wanted to mention that application/sql is indeed a IANA media type: https://tools.ietf.org/html/rfc6922.

Not a coincidence - I looked that up before suggesting ;)

wolfgangwalther commented 4 years ago

That being said, besides the SQL query, sometime ago I had the request to expose the metadata for how pgrst does the JOINs on resource embedding(see Embedding disambiguation), this currently only gets exposed when there's an error.

I can imagine this being very helpful. It feels like this could be in the OpenAPI output as well. Couldn't find much on that, only that such relations seem not to be a part of the OpenAPI spec currently: https://github.com/OAI/OpenAPI-Specification/issues/1646

If not in the OpenAPI, the OPTIONS request method comes immediately to mind... seems like quite a match.

So for now I'm not sure if it's worth to expose the SQL or other information.

Same here.

Still, some thoughts, if we were to implement this. If we want to do this for modifying statements (INSERT/UPDATE/DELETE) and RPC as well - just returning the SQL with 2xx status code would imply that the request was successful and the changes were made. So maybe it should just be a status code in the 4xx range, which would indicate, that the request was not actually executed. Not sure which status code exactly, though. 418 maybe :D.

Another approach could be to use the http method TRACE to do this. If we think of postgrest as just a proxy between the client and the postgres server, the proper response to a TRACE request from postgres should be the query (= the "SQL request"). We could probably pass the http method to create the statement for (GET, POST, ...) via custom header. Not sure about the request body. The query for POST and PATCH changes with a different request body, but the RFC explicitely states, that a TRACE request should not have one...: https://tools.ietf.org/html/rfc7231#section-4.3.8

Implementing TRACE in general with the Max-Fowards header to return the http request received by postgrest, could be helpful in a proxy setting as well.

viiicky commented 4 years ago

Okay folks - as I am no longer blocked, I am closing this off from my side. Feel free to re-open this if you guys want to continue the discussion or if it's revisited in future.

Thanks to all of you for quick responses and directions to the code-base. It was extremely helpful. :party_parrot: Adios.

steve-chavez commented 4 years ago

I've noted that viiicky/postgrest only gives the main SQL query(SELECT .. FROM .. WHERE) and not the full wrapper SQL query(with the json_agg part). Still looks useful though.

viiicky commented 4 years ago

Yeah, I just needed the main SQL query for my need, and I figured out we don't need this feature in the near future in the upstream, so kept it custom to my reqs.

Also, I have already released the docker image for this. In case future readers find it helpful: https://hub.docker.com/r/fylehq/postgrest

PostgREST / postgrest

Feature request: Expose an endpoint that returns the sql query generated instead of the result of the query #1573

Environment

Feature request