PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
23.36k stars 1.03k forks source link

Support proper bulk update via PATCH #1959

Open ghost opened 3 years ago

ghost commented 3 years ago

Environment

Description of issue

Schema:

drop table if exists example;
create table if not exists example (
    pk integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    name   varchar(40) NOT NULL CHECK (name <> ''),
    value   varchar(40) NOT NULL CHECK (value <> '')
);

insert into example (name, value) values ('one', '1');
insert into example (name, value) values ('two', '2');
insert into example (name, value) values ('three', '3');

Postgrest usage:

#!/bin/bash

set -e

data='
    [
        {
            "pk": 1,
            "name": "example"
        },
        {
            "pk": 2,
            "name": "second"
        }
    ]
'

# returns all three
curl \
    -X GET "http://localhost:9000/example"

echo -e ""
echo -e "--"

# null value in column \"value\" violates not-null constraint
curl \
    -X POST "http://localhost:9000/example" \
    -H "Prefer: resolution=merge-duplicates" \
    -d "${data}"

echo -e ""
echo -e "--"

# null value in column \"value\" violates not-null constraint
curl \
    -X POST "http://localhost:9000/example?columns=name" \
    -H "Prefer: resolution=merge-duplicates" \
    -d "${data}"

Bulk update on partial set of columns triggers the NOT NULL constraint of the rest of the columns which are set as NOT NULL. Currently we relaxed the constraints in our database to fix that, and looked at doing this via a stored procedure, however it would be nice to have an official suggestion / fix for this.

wolfgangwalther commented 3 years ago

The reason is that you are not doing a bulk update at all. You are doing an INSERT. I guess you wanted to use PUT instead of POST to make it a INSERT ... ON CONFLICT DO UPDATE ..., but that would still yield the same error. The error happens at the INSERT stage, before PostgreSQL ever reaches the UPDATE part.

PUT is not a proper "bulk update".

Related: #1945.

ghost commented 3 years ago

Thanks for the clarification, makes perfect sense. Is a PATCH the HTTP verb that should be used for this? Should I close this issue and move the discussion over at the Q&A?

wolfgangwalther commented 3 years ago

Is a PATCH the HTTP verb that should be used for this?

PATCH is the proper way to do a real UPDATE. However, it does not have bulk capabilities.

Should I close this issue and move the discussion over at the Q&A?

Since we already have a Q&A item for that topic, I'd keep this issue. It highlights why PUT is not a full replacement for a bulk update, so we can track this feature request here.

wolfgangwalther commented 3 years ago

This is currently an idea only. I can see the benefits of being able to do a bulk UPDATE - but I don't see how the REST API interface would look like, yet.

Maybe a query could somehow look like:

UPDATE table
SET column = pgrst_body.column
FROM (SELECT * FROM   json_populate_recordset(NULL::table, $1)) AS pgrst_body
WHERE table.pk_column = pgrst_body.pk_column
wolfgangwalther commented 3 years ago

but I don't see how the REST API interface would look like, yet.

Actually... Do we currently support PATCHing with an array body? I don't think so. If not, this could be the non-breaking interface:

steve-chavez commented 3 years ago

Actually... Do we currently support PATCHing with an array body? I don't think so.

An array is accepted, but only the first element is taken into account for the UPDATE.

There was also another option for the query with an UPDATE..FROM, mentioned on https://github.com/PostgREST/postgrest/issues/1816

(Not sure which query would be best atm)

wolfgangwalther commented 3 years ago

There was also another option for the query with an UPDATE..FROM, mentioned on #1816

(Not sure which query would be best atm)

Ah, thanks for linking that other issue, too. I think the queries are basically the same, except for when the json body is parsed. The proposal in #1816 looks like this would be done in haskell, while the one here is doing it on the database side. I think we generally prefer the latter and try to avoid the former, right?

steve-chavez commented 2 years ago

Bulk UPDATE with filters taken from PK columns

This could work as a first iteration of this feature. If a different payload column is desired as the filter, perhaps we could reuse the on_conflict param. Or like requested on https://github.com/PostgREST/postgrest/issues/2123, perhaps the more clear on_constraint param?

steve-chavez commented 2 years ago

Our query already mostly works for this, it's only a matter of when to apply the payload fields as filters.

WITH
pgrst_payload AS ( SELECT '[{"id": 4, "name": "xxx"}, {"id":5, "name": "yyy"}]'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
UPDATE "test"."projects" SET "name" = x."name" FROM (SELECT * FROM json_populate_recordset (null::"test"."projects" , (SELECT val FROM pgrst_body) )) x
WHERE  "test"."projects"."id" = x.id RETURNING "test"."projects".*;

 id | name | client_id
----+------+-----------
  4 | xxx  |         2
  5 | yyy  |

Tbd: What happens with querystring filters in this case? Could be either ignored or added additionally.

We could add filters(id=gt.4):

WITH
pgrst_payload AS ( SELECT '[{"id": 4, "name": "xxx"}, {"id":5, "name": "yyy"}]'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
UPDATE "test"."projects" SET "name" = x."name" FROM (SELECT * FROM json_populate_recordset (null::"test"."projects" , (SELECT val FROM pgrst_body) )) x
WHERE  "test"."projects"."id" = x.id AND "test"."projects"."id" > 4 RETURNING "test"."projects".*;

 id | name | client_id
----+------+-----------
  5 | yyy  |

But it doesn't help for knowing when to apply the payload filters. I mean we cannot always apply that condition. If we did it, it could cause confusing situations like:

-- the payload body has no id
WITH
pgrst_payload AS ( SELECT '[{"name": "xxx"}, {"name": "yyy"}]'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
UPDATE "test"."projects" SET "name" = x."name" FROM (SELECT * FROM json_populate_recordset (null::"test"."projects" , (SELECT val FROM pgrst_body) )) x
WHERE  "test"."projects"."id" = x.id AND "test"."projects"."id" > 4 RETURNING "test"."projects".*;

 id | name | client_id
----+------+-----------
(0 rows)

-- Nothing gets updated

That would also break existing PATCH requests.


I think we should apply the payload filter whenever there are no query param filters. So:

PATCH /projects

[
  {"id": 5, "name": "xxx"},
  {"id": 6, "name": "yyy"}
]

Would generate the query with the id filter applied.

This would break the ability to PATCH the whole table in one request(deliberately or accidentaly), I'd argue that was always unwanted behavior(ref). So this:

PATCH /projects

{"name": "xxx"}

Will always result in 0 updates done(with no error from the db) and a 404 will be returned, same as doing an PATCH /projects?id=eq.non-existent request.

To make that operation possible, we can enforce a limit is present(taking advantage of https://github.com/PostgREST/postgrest/pull/2211):

PATCH /projects?limit=10&order=id

{"name": "xxx"}

Only 10 updates will happen.

steve-chavez commented 2 years ago

Another good thing is that a full table UPDATE is an expensive operation(can take a long time on big tables) so by requiring a limit we prevent it from happening.

laurenceisla commented 2 years ago

Reopened, now that is #2424 is merged.

steve-chavez commented 1 year ago

So this was reopened because of the reasons mentioned on https://github.com/PostgREST/postgrest/pull/2414#issuecomment-1212647377. It basically broke existing functionality. There's an alternate implementation with underscore/dollar operators, but that's a more complicated feature.

How about making this simpler and we just do bulk update based on PKs when Prefer: params=multiple-objects(only used for RPC now) is specified?

Later on it could be extended to have Prefer: params=multiple-objects; unique_keys=id.

This syntax on the header also goes in line with the improvements proposed on https://github.com/PostgREST/postgrest/issues/2066#issue-1070332545

I believe a better syntax for this is now the one posted on https://github.com/PostgREST/postgrest/issues/465#issuecomment-1566188612.

steve-chavez commented 1 year ago

Reusing the syntax from the discussion on https://github.com/PostgREST/postgrest/issues/465#issuecomment-1722382293, I think this feature should be implemented in the following way.

Having:

GET /items?id=lte.3

[
            { "id": 1, "name": "Item 1" }
          , { "id": 2, "name": "Item 2" }
          , { "id": 3, "name": "Item 3" }
]

Doing:

PATCH /items?id=lte.3&name=set.$body->name
Prefer: return=representation

[
            { "id": 1, "name": "Updated Item 1" }
          , { "id": 2, "name": "Updated Item 2" }
          , { "id": 3, "name": "Updated Item 3" }
]

Will result on:

[
            { "id": 1, "name": "Updated Item 1" }
          , { "id": 2, "name": "Updated Item 2" }
          , { "id": 3, "name": "Updated Item 3" }
]

After this, #465 should be easier to implement too.

taimoorzaeem commented 12 months ago

@steve-chavez @laurenceisla Now that #2926 is merged, POST returns 200 on an update. So, maybe this can be done with POST. e.g

POST /items
Prefer: return=representation, resolution=merge-duplicates

[
            { "id": 1, "name": "Updated Item 1" }
          , { "id": 2, "name": "Updated Item 2" }
          , { "id": 3, "name": "Updated Item 3" }
]
HTTP/1.1 200 OK

[
            { "id": 1, "name": "Updated Item 1" }
          , { "id": 2, "name": "Updated Item 2" }
          , { "id": 3, "name": "Updated Item 3" }
]