PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
23.11k stars 1.02k forks source link

Computed relationship changes the shape of main query when selecting it as a computed column #2480

Open laurenceisla opened 1 year ago

laurenceisla commented 1 year ago
create schema api;

create table api.premieres (
  id text primary key,
  film_id text -- references api.films(id)
);
insert into api.premieres values
  ('P1', 'F1'),
  ('P2', 'F1'),
  ('P3', 'F2'),
  ('P4', 'F3')
 ;

create table api.films(id text primary key);
insert into api.films values('F1'), ('F2'), ('F4');
create or replace function api.computed_films_m2o(api.premieres) returns setof api.films as $$
  select * from api.films where id = $1.film_id
$$ stable language sql rows 1;

I see that taking a set-returning function as a computed column is acceptable, but the result is still unexpected (to me). Intuitively (and conventionally with PostgREST), a query with computed columns returns the same number of rows as the main query, as in a left join. That is, if we accept taking the computed_films_m2o as a computed column, it should return

GET /premieres?select=*,computed_films_m2o

[{"id":"P1","film_id":"F1","computed_films_m2o":{"id":"F1"}},
 {"id":"P2","film_id":"F1","computed_films_m2o":{"id":"F1"}},
 {"id":"P3","film_id":"F2","computed_films_m2o":{"id":"F2"}},
 {"id":"P4","film_id":"F3","computed_films_m2o":null}]

instead. Otherwise, we have to break the current behevior of that a computed column never change the shape of the main query.

But the presented inner join is not a bug, but a postgresql feature, because PostgreSQL's behavior for a set-returning function in a query's select list is almost exactly the same as if the set-returning function had been written in a LATERAL FROM-clause item instead, i.e.,

select *,  computed_films_m2o(premieres) from premieres

is almost the same as

select premieres.*,  films from premieres, lateral computed_films_m2o(premieres) as films

Regarding that, we could either exploit it as a postgrest feature that a set-returning function as computed column is for inner-join and a rowtype-returning scalar function as computed column is for left-join, or simply deny a set-returning function taken as a computed column for less headache.

Personally, I think the former option is very interesting and makes postgrest more flexible, although it requires explicit documentation and more tests and experiments.

Originally posted by @Iced-Sun in https://github.com/PostgREST/postgrest/issues/2475#issuecomment-1251843074

laurenceisla commented 1 year ago

This was my take of the issue:

From my POV I see it more using a PostgreSQL perspective. Also, this was allowed even before the computed relationships were implemented. I just tested GET /premieres?select=*,computed_films_m2o on PostgREST v9.0.1 and got the same result, returning a different number of rows than the main query (it may work the same way for older versions), so changing this behavior could be considered a breaking change. Still, it needs consensus, I think.

steve-chavez commented 1 year ago

So since computed relationships are computed columns they can be used on select and thus filter the same way as an !inner.

I think we should keep the convention of select not filtering the rows. !inner broke this but it was a mistake that we can correct with the exists filter(https://github.com/PostgREST/postgrest/issues/2340)

or simply deny a set-returning function taken as a computed column for less headache.

I would prefer that option. This would require considerable work though because we don't have computed columns in the schema cache.

wolfgangwalther commented 1 year ago

or simply deny a set-returning function taken as a computed column for less headache.

I would prefer that option. This would require considerable work though because we don't have computed columns in the schema cache.

I think we should make sure to make the computed relationships features able to call all kinds of computed columns (i.e. scalar types with empty parens select=*,computed() and inline support for all types) and then deprecate the "implicit" way of calling computed columns without brackets. A next step could be to check all column names against the real columns of the queried relation and to reject calling computed columns without parens - i.e. disabling the deprecated feature.

This would give us a way out of the "select affects the number of rows returned" mess we have right now.


Summarizing this and the other issue:

We currently have two problems:

We can fix both by making the new syntax cover all cases and then remove the old syntax.