PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
23.25k stars 1.02k forks source link

Spread operator for many-to-many relationships, aliases, and aggregations #3041

Open jdgamble555 opened 10 months ago

jdgamble555 commented 10 months ago

Currently the spread operator works on one-to-one and many-to-one relationships. We need to get it to work in all situations like many-to-many relationships:

Schema:

CREATE TABLE posts (
  id uuid NOT NULL PRIMARY KEY DEFAULT uuid_generate_v4(),
  title text NOT NULL CHECK (char_length(title) > 2),
  ...
)
CREATE TABLE tags (
  name text,
  pid uuid REFERENCES posts(id) ON DELETE CASCADE ON UPDATE CASCADE,
  created_at timestamptz NOT NULL DEFAULT now(),
  PRIMARY KEY (name, pid)
);

-- for later example

CREATE TABLE hearts (
  pid uuid NOT NULL REFERENCES posts(id) ON DELETE CASCADE ON UPDATE CASCADE,
  uid uuid NOT NULL REFERENCES profiles(id) ON DELETE CASCADE ON UPDATE CASCADE,
  created_at timestamptz NOT NULL DEFAULT now(),
  PRIMARY KEY (pid, uid)
);

Spread on Item

GET /posts?select=title,tags(...name)

Response:

[
  {
    "title": "post title",
    "tags": ["technology", "buddy"]
   }
]

Spread on Count

GET /posts?select=title,...hearts(hearts_count:count)

Response:

[
  {
    "title": "post title",
    "hearts_count": 10
  }
]

Sort by Foreign Column (with alias and spread)

Would help with the sorting by foreign column:

GET /posts?select=title,...hearts(hearts_count:count)&order=hearts_count.desc

Multiple Levels

Of course if you model your tags with a junction table, you need something like this that works for multiple levels:

GET /posts?select=title,...posts_tags(tags(...name))

Response:

[
  {
    "title": "post title",
    "tags": ["technology", "food"]
   }
]

Spread on Table (with duplicate column name)

I'm not sure how useful this one would be, as this may could be translated differently.

GET /posts?select=title,created_at,...tags(name,tag_created_at:created_at)

Response:

[
  {
    "title": "post title",
    "name": ["technology", "buddy"],
    "tag_created_at": ["1-1-01", "1-2-02"]
  }
]

I think the spread operator has so much potential to simplify things. It could also simplify the queries under the hood.

J

jdgamble555 commented 10 months ago

Just thought of a complex example here. Imagine that you want to get all the related posts. Related posts are posts that share the same tags as the current posts sorted by relevance (number of duplicate posts found).

Imagine if you could just do this:

GET /posts?select=...tags(posts(*,count))&id=eq.0x123&posts.id=not.eq.0x123&order=count

Instead of a custom function like this:

DROP FUNCTION IF EXISTS related_posts;
CREATE OR REPLACE FUNCTION related_posts(id uuid)
RETURNS SETOF posts
LANGUAGE sql
AS $$
  SELECT p.* 
  FROM posts p
  JOIN tags t1 ON p.id = t1.pid
  JOIN tags t2 ON t1.name = t2.name 
    AND t2.pid = related_posts.id
  WHERE p.id != related_posts.id
    AND p.status = 'published'
  GROUP BY 
    p.id, 
    p.parent, 
    p.status,
    p.title, 
    p.author, 
    p.content, 
    p.image, 
    p.slug, 
    p.minutes, 
    p.created_at, 
    p.updated_at, 
    p.published_at
  ORDER BY count(*) DESC;
$$;

This would technically require you to be able to put count and another column in the same request. This is just some ideas of where postgREST could be one day.

This would also solve a lot of Group By problems:

https://github.com/PostgREST/postgrest/issues/158

J

steve-chavez commented 10 months ago

Spread on Count GET /posts?select=title,...hearts(hearts_count:count)

@jdgamble555 Sounds like a good idea. We've been wanting this for aggregate functions https://github.com/PostgREST/postgrest/pull/2925.

I'm thinking the rule should be to allow to-many spreads if there's only aggregates (no other columns) on the embedded relation. This to ensure there's only one row present.

GET /posts?select=title,tags(...name) [ { "title": "post title", "tags": ["technology", "buddy"] } ]

To be consistent with the current syntax:

GET /posts?select=title,...tags(name)

The restriction for to-many spreading for this case would be to have a single column on the embedded relation.

Sort by Foreign Column (with alias and spread) Would help with the sorting by foreign column:

Looks related to https://github.com/PostgREST/postgrest/issues/2730. If we solve spread on aggregates I believe this will come for free.

Spread on Table (with duplicate column name) I'm not sure how useful this one would be, as this may could be translated differently. GET /posts?select=title,created_at,...tags(name,tag_created_at:created_at) [ { "title": "post title", "name": ["technology", "buddy"], "tag_created_at": ["1-1-01", "1-2-02"] } ]

That one doesn't seem that useful. The user would have to match both arrays to get the complete data and we'd need to ensure the order somehow. Seems hard to implement too.


Not sure yet of the shape of the queries.

philmetzger commented 6 months ago

I would love this to be a thing! Have the same issue where i want to do a filter for overlap on a joined tags table.

laurenceisla commented 2 months ago

Not sure yet of the shape of the queries.

I'm giving this a shot in #3640. My approach is to wrap all the selected columns in json_agg(). This would allow us to select more than one column in the embedded spread. For example, this request:

curl 'localhost:3000/clients?select=name,...projects(name,id)'

Would roughly translate to:

SELECT "test"."clients"."name",
       "clients_projects_1"."name",
       "clients_projects_1"."id"
FROM "test"."clients"
LEFT JOIN LATERAL (
  SELECT json_agg("projects_1"."name") AS "name",
         json_agg("projects_1"."id") AS "id"
  FROM "test"."projects" AS "projects_1"
  WHERE "projects_1"."client_id" = "test"."clients"."id"
) AS "clients_projects_1" ON TRUE

Then, the response would be:

[
  {"name":"Microsoft","name":["Windows 7", "Windows 10"],"id":[1, 2]},
  {"name":"Apple","name":["IOS", "OSX"],"id":[3, 4]}
]

I'll be updating any caveats I encounter in the PR.