Open steve-chavez opened 1 year ago
@wolfgangwalther Any feedback on this one? Doing it will involve a big refactor. So I'm wondering if you see anything wrong before I start.
I did not have the time, yet, to follow along what exactly has been implemented for Accept
already. Am I right that currently it's only possible to say RETURNS "media/type"
for functions using a domain media/type
? So we don't support any of the CAST
idea, yet, but all the pieces are in place to be able to extend that with casts later on?
Assuming the above is correct, then:
The answer to those questions in the second paragraph is: Casts. Those functions are effectively implementing a cast, so we should create proper casts for them and detect them via casts. Then they can be located anywhere, don't conflict with RPCs, etc.
I would suggest to do the function interface first, and then extend later on.
Am I right that currently it's only possible to say RETURNS "media/type" for functions using a domain media/type? So we don't support any of the CAST idea, yet, but all the pieces are in place to be able to extend that with casts later on?
Yes (for aggregates too (https://github.com/PostgREST/postgrest/pull/2825#issue-1758144649)) and yes.
I would suggest to do the function interface first, and then extend later on.
Agree, that sounds good.
The answer to those questions in the second paragraph is: Casts. Those functions are effectively implementing a cast, so we should create proper casts for them and detect them via casts. Then they can be located anywhere, don't conflict with RPCs, etc.
Oh, thinking more about the above. Wouldn't the interface for media type handlers be simpler too if it's defined via CASTs?
So the example on https://postgrest.org/en/stable/references/api/media_type_handlers.html#handlers-for-tables-views would be:
create or replace function twkb_handler_transition (state bytea, next lines)
returns bytea as $$
select state || st_astwkb(next.geom);
$$ language sql;
create or replace aggregate twkb_agg (lines) (
initcond = ''
, stype = bytea
, sfunc = twkb_handler_transition
);
CREATE OR REPLACE FUNCTION twkb(lines) RETURNS "application/vnd.twkb" AS $$
select twkb_agg($1);
$$ LANGUAGE SQL IMMUTABLE;
CREATE CAST (lines AS "application/vnd.twkb") WITH FUNCTION twkb(lines) AS IMPLICIT;
That way the aggregate is an internal detail and the media type is not at all involved in its definition. I believe it's much more clear despite having to define one more function too.
@wolfgangwalther WDYT? This would be a breaking change on current media type handlers, but not for functions only for tables (which doesn't seem to be used that much yet).
Damn, the above makes sense but the current query doesn't work:
WITH pgrst_source AS ( SELECT "test"."lines".* FROM "test"."lines")
SELECT
null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
twkb(_postgrest_t::"test"."lines") AS body,
nullif(current_setting('response.headers', true), '') AS response_headers,
nullif(current_setting('response.status', true), '') AS response_status,
'' AS response_inserted
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;
ERROR: column "_postgrest_t.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 5: twkb(_postgrest_t::"test"."lines") AS body,
And an aggregate cannot be used as cast function:
CREATE CAST (lines AS "application/vnd.twkb") WITH FUNCTION twkb_agg(lines) AS IMPLICIT;
ERROR: cast function must be a normal function
The only way I see to make the CAST abstraction work is to scan the body of the CREATE FUNCTION twkb
and apply the internal function ourselves. Too bad that pg can't see that there's an aggregate inside the SQL function.
and apply the internal function ourselves
Maybe that's not too bad, we can apply the function body directly like:
WITH pgrst_source AS ( SELECT "test"."lines".* FROM "test"."lines")
SELECT
null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
(select twkb_agg(_postgrest_t::"test"."lines")) AS body,
nullif(current_setting('response.headers', true), '') AS response_headers,
nullif(current_setting('response.status', true), '') AS response_status,
'' AS response_inserted
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;
That also lets apply functions on top of the aggregation, which is not possible with the current interface. This makes it possible to do select coalesce(json_agg($1),[])->0
to override application/vnd.pgrst.object
.
The any handler interface would have to change though, since the aggregate final function would no longer be used for setting the response headers.
The "correct" way to do this, would be to use array_agg(...)
all the time, once you hit a CAST, so something like this:
-- note the argument type is now an array
CREATE OR REPLACE FUNCTION twkb(lines[]) RETURNS "application/vnd.twkb" AS $$
-- whatever you need to do to the array
$$ LANGUAGE SQL IMMUTABLE;
CREATE CAST (lines[] AS "application/vnd.twkb") WITH FUNCTION twkb(lines[]) AS IMPLICIT;
WITH pgrst_source AS ( SELECT "test"."lines".* FROM "test"."lines")
SELECT
null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
twkb(array_agg(_postgrest_t::"test"."lines"))) AS body,
nullif(current_setting('response.headers', true), '') AS response_headers,
nullif(current_setting('response.status', true), '') AS response_status,
'' AS response_inserted
FROM ( SELECT * FROM pgrst_source ) _postgrest_t;
That means the user can either provide custom aggregates, probably mostly useful when implemented in C for performance (i.e. pass through the full response once instead of twice with the cast). Or they can provide the cast, in which case we still do the aggregation for them beforehand, but in a type-preserving way via array_agg()
. This way you can still create different casts for different endpoints.
The "correct" way to do this, would be to use array_agg(...) all the time, once you hit a CAST, so something like this:
How would that work? I'm guessing like:
CREATE OR REPLACE FUNCTION twkb(lines[]) RETURNS "application/vnd.twkb" AS $$
select twkb_agg(x) from unnest($1) x;
$$ LANGUAGE SQL IMMUTABLE;
Wouldn't it be wasteful to aggregate the array (twkb(array_agg(_postgrest_t::"test"."lines"))
) and unnest it again?
AFAICT, operations over arrays on pg always require unnesting, there's no array.map()
function (ref).
That means the user can either provide custom aggregates, probably mostly useful when implemented in C for performance. Or they can provide the cast
Hm, ideally handling both content-type and accept would use the same interface with CASTs. Otherwise this complicates the docs and the implementation.
Currently it doesn't look good how we mix the media type with the aggregate definition.
Wouldn't it be wasteful to aggregate the array (twkb(array_agg(_postgrest_t::"test"."lines"))) and unnest it again?
AFAICT, operations over arrays on pg always require unnesting, there's no array.map() function (ref).
What makes you think that unnest/agg is less efficient than a map function if that was implemented? I have no idea whether it's inefficient, but unnest/agg just seems to be idiomatic SQL code. So certainly not bad. Also note that this is a special case for a cast function written in SQL. For example the function given in https://github.com/PostgREST/postgrest/issues/3160#issue-2082417765 would be much more efficient with array_agg. Currently it uses a poor-man's aggregation via json/jsonb, but that's just because that's the only way to do it right now. An aggregation via array_agg would be much better - and then the python code doesn't care how it receives the data.
Hm, ideally handling both content-type and accept would use the same interface with CASTs.
Agreed. But the example in https://github.com/PostgREST/postgrest/issues/2826#issue-1758188980 is insufficient:
CREATE FUNCTION convert_projects("application/json") RETURNS projects AS $$
-- ...
$$ LANGUAGE SQL;
This would only ever allow a single project to be returned. What if the json document contained multiple projects? I'm quite sure that a cast can't return a SET OF
. So the only thing you could return would be projects[]
- and then PostgREST would have to unnest that to insert etc.
This is exactly the opposite of array_agg
.
So basically:
SET OF
. The corresponding output (i.e. "how to transform a set into a scalar?") are.. aggregates.Very symmetric.
Problem
Currently we allow handling raw input for
application/json
,text/xml
,text/plain
,application/octet-stream
on functions(ref).However it's inflexible, no other custom media types can be handled. It also doesn't work for tables/views.
Solution
Use a domain-based media type as a function unnamed parameter to do custom conversion. Unlike ref this can work for both functions and tables/views.
Functions interface
Tables/views interface
e.g. the first function will be used when doing:
This means we'll use the unnamed parameter + return type to apply a conversion function for a particular table/view.
Inside the function a
json_to_recordset
can be used (this is what we do internally). With this interface the user can override our default behavior.Thanks to overloaded functions the same function name can be used to handle different media types.
Notes
Related