Require criteria for PATCH and DELETE

eradman commented 8 years ago

The documentation acknowledges that updating or deleting records is dangerous, but it is only so because postgrest accepts PATCH and DELETE without criteria:

curl -s -X PATCH $url/employees \
    -H "Content-Type: application/json"  -d '{"first_name": "John"}'

This is never what is intended, and it's the default behavior when experimenting at the command line or when testing a new code path. In some environments configuring an HTTP proxy with a set of rules to guard against this would add significant complexity. Would it be reasonable to simply throw an error if no criteria is supplied?

begriffs commented 8 years ago

That is sensible. You might argue that a proxy could add that protection...but like you said when is it ever intended delete or update everything? Seldom. And there would still be ways to kind of confirm that like ?id=gt.1, just a token filter.

It's a weird default policy that we have now. I can imagine changing the default to deny those kind of requests and maybe include a command line flag like --unrestricted-deletes in case somebody does want that behavior.

ruslantalpa commented 8 years ago

I could say ... because PostgreSQL accepts DELETE and UPDATE without criteria and it would be the same as above and still that's how sql works, you are responsible for what you request/execute and tests should not be done on live datasets.

I also don't agree with the statement that enforcing parameters adds significant complexity, it thinks it's literally 2-3 lines of config in nginx to enforce DELETE/PATCH to only work on url's like /endpoint/:id

Parameters like --unrestricted-deletes can never work since you can provide a parameter to delete and give it a value that matches most of the rows and what's worse it will give a false sense of security that it gives some kind of protection against abuse.

The only thing that comes to mind is something like --disable-multirow-delete-update a thing that will allow a delete/update only if you provide a specific primary key (which is actually as i said easy to implement in nginx config)

jackfirth commented 8 years ago

For the record, I actually do use unrestricted deletes in a legitimate fashion. I've made a postgrest seeding docker image which reads data out of csv files, deletes the tables they correspond to, then posts the data in the files to postgrest. I use this to make it easy to write some example or testing data in csv files and stick them into a seeding image, then I can seed the database with docker and docker compose. I'd much rather enforce table deletion restrictions through a proxy than postgrest itself.

begriffs commented 8 years ago

Well I stand corrected!

@eradman what do you think about trying to guard your routes with nginx? @ruslantalpa do you have an example nginx config to do this?

ruslantalpa commented 8 years ago

not tested but along the lines of

set $is_bad ""
if ( $request_method = POST ) { set $is_bad "POST" }
if ( $request_method = DELETE ) { set $is_bad "DELETE" }
if ( $arg_id = false ) { set $is_bad "${is_bad}:noid" }
if ( $is_bad = "POST:noid" ) { return 400 }
if ( $is_bad = "DELETE:noid" ) { return 400 }

ruslantalpa commented 8 years ago

much nicer would be in openresty

access_by_lua_block {
if( ngx.var.request_method == "DELETE" and ngx.var.arg_id == nil ) then ngx.exit(400) end
}

again, not tested but close to the real thing

eradman commented 8 years ago

I appreciate the examples. In a short span of time I was not able figure out how to do this with Varnish or Apache so I'll stand up Nginx instead.

ruslantalpa commented 8 years ago

i would suggest going for openresty directly, same nginx + ability to script the config with no performance penalty

eradman commented 8 years ago

Here is the final config that I tested:

set $is_bad "";
if ( $request_method = PATCH ) { set $is_bad "PATCH"; }
if ( $request_method = DELETE ) { set $is_bad "DELETE"; }
if ( $args = "" ) { set $is_bad "${is_bad}:noid"; }
if ( $is_bad = "PATCH:noid" ) { rewrite ^ $scheme://$host$request_uri?id=; }
if ( $is_bad = "DELETE:noid" ) { rewrite ^ $scheme://$host$request_uri?id=; }

This causes postgrest to return a JSON error that the client might expect

{"details":"unexpected end of input expecting \"not.\" or operator (eq, gt, 
...)","message":"\"failed to parse filter ()\" (line 1, column 1)"}

I am able to proceed now that I have a workaround, but I consider this to be a design bug. SQL needs to be backward compatible, but there is no such restriction on a REST interface. There is also a difference--at an SQL prompt the DBA will use the commands

BEGIN;
UPDATE employees SET first_name='' ...;

And if you see UPDATE 5039; as the reply you will not type COMMIT;! With an HTTP interface you don't have the opportunity to ABORT a mistake.

ruslantalpa commented 8 years ago

first you should do it like this return 400 "whatever response body you want" and not rewrite

then, this is not a workaround, it's the way it was designed to be implemented :) you can do delete/update without parameters not because sql needs to be backward compatible but because this is useful, it's the same with (our) rest interface, it's useful and @jackfirth gave you an example And last :) rest interface is not a tool to query your db by hand and play with the dataset. You find out the exact requests you need to make for your app (with a sample dataset) then you use those queries/requests in your app.

PS: also your $args = "" condition only checks a parameter is present, i can still add a dummy parameter or a parameter that matches all your records and drop them. It feels like you added this only to catch your mistakes but this is not good for production.

eradman commented 8 years ago

I should mention that I'm only fussing over this because PostgREST is awesome and I would like to deploy it. Not only is it really clean, but it's worst-case performance is still 10x faster than the Django web app I'm attempting to replace.

I'm not trying to catch my mistakes, I'm trying to prevent any one of 20 developers from hitting a trap. Think of it as a guard-rail around the edge of a roof--you can deliberately step over it but you won't anciently step backwards off of it and fall to your death.

Our developers can destroy the production database using the Django RPC interface we built, but only if they make a deliberate effort to do so. @jackfirth gave us an example of how to not to build an internal API. There is no reason he can't unlock the safety with ?pk=gt.0.

Your observation that $args = "" does not catch non-column arguments is astute. The query string ?select=name bypasses these nginx rules. Doesn't this demonstrate that properly implementing this rule in the proxy is impossible?

Edit: even though my nginx rules are flawed, they are probably good enough to qualify as a fence. Feel free to close this issue if the default behavior is judged to be acceptable.

ruslantalpa commented 8 years ago

intelligent discussion/debate leads to better things, so let's continue Let's first understand the problem, i'll ask some questions to understand what you are trying to build and and what do you want to guard against?

Will the api be exposed to public?
Do you need the ability to delete/update multiple rows at the same time (in multiple places in your app) or is deleting single rows enough?
Who is the consumer of your API, is it your frontend or are you delivering only the API?
If you need the possibility of delete/update multiple rows, suppose you are building a traditional api and you get a delete on an endpoint, what do you do, what do you check (pseudocode), is this code to protect against one time mistakes by developers or does it try to prevent someone doing bad things?
if it's just against one time mistakes, why not avoid completely and always work on a local database

eradman commented 8 years ago

Sure, here's a rough outline of the environment I'm working with:

A Django JSON-RPC server is on a private network that connects a set of special-purpose machines. This HTTP service provides an interface that various bits of infrastructure can use to fetch and send updates to system state.
PostgreSQL is used as a coordination point for inputs from these special purpose machines. There is more than one web server, but the database provides a single source of truth for tasks to be scheduled, error history, resources available, etc.
The consumers of the API are the developers who write applications for the special-purpose machines. These applications may be command line tools, daemons, or embedded software.

The majority of database interactions are single inserts and updates. When multiple records need to be updated at once the logic follows this pattern:

@servicemethod
def set_fan_speed(self, speed, rack_ids):
    racks = Rack.objects.filter(id__in=rack_ids)
    racks.update(fan_speed = speed)
    return to_json(racks)

The weakness of this scheme is that every single action needs a corresponding function on the server. I estimate that a REST interface that emerge from the database schema could eliminate 90% of an inconsistent API that was built up over years. (wow)

These RPCs are so specialized that it's hard to use them incorrectly. In the example above rack_ids must must be a list of values, so it takes significant effort to update every row in the Rack table. Because the interface is rigid a naive unit test will usually catch egregious errors because no criteria means update nothing:

def test_set_fan_speed_rpc(self):
    self.rpc.set_fan_speed(66, [])
    self.assertEqual(self.rpc.get_fan_speed(102), 66) # fail

Now after switching to an interface where no criteria means update everything, simple client tests no longer prove enough:

def test_set_fan_speed_rpc(self):
    requests.patch(self.url + "/rack", json={'fan_speed': 66})
    self.assertEqual(requests.get(self.url + "/rack",
        json={'id': 'eq.102', 'select': 'fan_speed'}), 66) # pass

Of course everyone should write more thorough tests, but I'm concerned about mistakes that both easy to make and have a severe impact.

ruslantalpa commented 8 years ago

ok, seems to me that as you said, most of the time you update single rows and even when you update multiple rows at once, you do that by providing all their ids. Why not do as i suggested before, in update/delete paths, force the presence of the id column. want to update 1 record -> id=eq.10, want to update multiple records -> id=in.10,20,30 in rare cases where you need to update/delete multiple rows but not by their ids you can do: a) a custom rpc with a specific list of parameters b) a custom location/config that does not enforce the id column but checks some other condition.

On a general note: I notice a tendency (not just you, everybody) to think of nginx only as the thing that takes in a request and forwards it to postgrest and then postgrest is supposed to solve everything (maybe with the help of the db). I think it stems from the fact how the apis are built without postgrest, request comes in, a node/php/ruby entrypoint is invoked that does everything and treats the db as a dumb data store.

In a system build with postgrest, each component is equaly important and has it's part of the responsibility. PostgREST - think of it as doing exclusively translating http(with sepcific format) to sql. PostgreSQL - enforces security and executes the sql and generates the final version of the output (the other components just forward the thing generated in the db)

The role of nginx (but again, i would suggest going directly for openresty) is also very important in this setup. Consider the old ways, in your bootstrap, before the router is invoked and the request is routed to the appropriate controller/action, you probably have a bunch of hooks that check stuff like is the user authenticated, can he access this controller/action at all, does the request have the right "format" Think of nginx as the place where you add these hooks that prevalidate (or transform) the request before forwarding it to postgrest. Your specific case is easy to solve with just plain nginx config that enforces arg_id when doing delete/update but go for openresty and you will have a complete programming language (lua) running in nginx directly at about C speed and you will be able to do all kinds of crazy stuff with the incoming request like enforce a specific order, limit the depth of the tree, take in a call from a 3rd party (webhook) transform it to a postgrest url and execute it ... things like that.

eradman commented 8 years ago

I think I see why you are recommending openresty. Vanilla nginx isn't nearly as good at complicated imperative checks. Your are right that using the web proxy in this capacity is a mental shift. I normally use Nginx or Apache to map URLs to resources and little else.

Doing these checks in the proxy is also error-prone. Compare this with extracting a value from an XML document using regular expressions--you can do it--but it's hard to declare your resulting solution "correct". Even with the power of Lua the web proxy still doesn't know the different between a column name (fan_speed=eq.51), and parameters to control the results (select=fan_speed).

The statement that PostgreSQL "enforces security" started an idea from another angle: there is probably a way to teach Postgres to require a WHERE clause for UPDATE and DELETE on a per-schema basis. This capability is not built in, but it's probably not hard to write an extension that does this...

ruslantalpa commented 8 years ago

About another angle, you are probably looking for this http://stackoverflow.com/questions/2560737/count-number-of-rows-to-be-affected-before-update-in-trigger (forgot to mention)

Now about openresty :) i am going to go on a limb and say that you just skimmed over it and assumed that the only tools available would be the vanilla lua (and you would be wrong). Just like in any other env (node/ruby/php) you can have libs that parse json/xml/connect to db/.... You also have a FFI to C land (that you don't have in other places).

What if i told you that openresty is exactly how https://www.cloudflare.com/waf/ and http://graphqlapi.com/ are implemented, would this change your mind? :)

ruslantalpa commented 8 years ago

Another point on implementing the protection in the db. Since everything in PostgREST is done inside a transaction, you can have a after trigger on each table where you just check the number of rows that were updated/deleted and if it's above some constant then you raise an error so everything will be rolled back (maybe couple this with checking that the request is coming from postgrest by looking at some claim so that manual update/delete queries can go through).

eradman commented 8 years ago

I did assume that openresy was mostly an enhanced package (like nginx-lua or nginx-passenger). Using an AFTER trigger to fire an assertion is also interesting.

I went ahead and built a simple PG extension that prevents running UPDATE or DELETE without a WHERE clause:

https://bitbucket.org/eradman/pg-safeupdate/

Now I can sit back and enjoy an error message from the SQL prompt as well as the HTTP interface!

begriffs commented 8 years ago

Creating an extension, now that's smart! You should register it here http://pgxn.org/

begriffs commented 8 years ago

By the way there is discussion in the postgresql forum about adding a GUC variable to prevent deletes or updates lacking a where clause.

http://postgresql.nabble.com/PoC-Make-it-possible-to-disallow-WHERE-less-UPDATE-and-DELETE-td5912808.html

eradman commented 8 years ago

Glad you pointed that out--this thread is a great read. Interesting to see that this topic is being evaluated more seriously at the database level.

PostgREST / postgrest

Require criteria for PATCH and DELETE #663