PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
23.4k stars 1.03k forks source link

Require schema to be specified in request URL #2157

Open robertsosinski opened 2 years ago

robertsosinski commented 2 years ago

Small low-priority feature request but interested if others think this would be useful: A config flag to require the schema be specified in the request via either Accept-Profile or Content-Profile headers.

Right now, when you specify db-schema=v1,v2,v3, postgrest will automatically assume v1 if no schema is specified in the headers (as it is the first specified). I want to always require the schema be specified, perhaps via a config such as db-schema-always-required or something like that.

I think I have a workaround now, where I specify the config as db-schema=null,v1,v2,v3. This way, if the user does not specify a schema postgrest tries to find the null schema (which does not exist) and throws an error. Interested if there could be any security/performance considerations around this though.

Thanks!

steve-chavez commented 2 years ago

I want to always require the schema be specified

What's the main motivation for the above?

steve-chavez commented 2 years ago

I think I have a workaround now, where I specify the config as db-schema=null,v1,v2,v3 Interested if there could be any security/performance considerations around this though.

Not really, there's a bit of extra work the schema cache must do for searching the "null" schema but that should be negligible.

wolfgangwalther commented 2 years ago

One way to do it would be to use #1909 to customize the URI in a way that the schema needs to be passed in the URI. This would render any request without that part of the URI useless. I think this (nested routes based on schemas) was discussed somewhere else, but I can't find the issue right now.

In general, I think the URI-based schema selection would be easier for schema-based versioning stuff, compared to using the profile headers - especially now that I realized I need two of them? Uh.

robertsosinski commented 2 years ago

I want to always require the schema be specified

What's the main motivation for the above?

Good question. I'm thinking more on endpoint design for end-users, who will not be interested in understanding postgrest and just look for a consistent api.

The main reason I want end-users to always specify the schema, is because I will want to deprecate versions at some point. In the case above, v2 would need the headers, until I fully deprecate v1, and then the headers suddenly become optional (even though the latest version I want people to use would be v3). I guess, I could order the schemas differently, but it just seems like another thing to communicate.

Another use case is when the schemas are more lateral and used just for endpoint organization. Seeing that one namespace is magically optional while the others are required can seem like odd behavior for end-users (and then remembering which one is optional, for now). Forcing end-users to just consistently add the headers seems to reduce confusion.

In general, communicating to end-users that the headers are only "kinda sorta" needed, depending on which one, and that this might change, makes it more complicated for support imho.

In general, I think the URI-based schema selection would be easier for schema-based versioning stuff, compared to using the profile headers - especially now that I realized I need two of them? Uh.

This is another thing I've been thinking about. Just use a proxy to add the headers based on a path: eg. POST /v1/things -> add the Accept-Profile header v1 and be done. Then, the end-user does not even have to think about adding headers.

One other challenge I'm seeing with end-users is the confusion on when to use Accept-Profile vs Content-Profile. Someone saying an endpoint does not work, and its because they didn't realize the POST needed the Content-Profile header has bit me before. Proxying the schema in the route makes it mandatory (thus consistent), is pretty common place, and easy to reason about.

Interested in any thoughts. I think I'm preferring the proxy path approach at the moment.

wolfgangwalther commented 2 years ago

The main reason I want end-users to always specify the schema, is because I will want to deprecate versions at some point. In the case above, v2 would need the headers, until I fully deprecate v1, and then the headers suddenly become optional (even though the latest version I want people to use would be v3). I guess, I could order the schemas differently, but it just seems like another thing to communicate.

I thought about schema-based versioning lately, and I don't think using the Profile headers is a very good idea to do so. First of all, when I update my api to a new version, I might need to change the postgrest configuration. That's not possible when running one instance only. Secondly, it's annoying to set two different headers for different kinds of requests.

When you're ok to run multiple postgrest instances, one for each version, you can do the following:

CREATE ROLE authenticator; -- as usual / before
CREATE ROLE v1 IN ROLE authenticator LOGIN; -- the versioned authenticator role
ALTER ROLE v1 SET pgrst.db_schemas = 'v1';

CREATE SCHEMA v1;
-- ...

Then set up postgrest to use the v1 user to connect to the DB. You can still grant all the api roles to authenticator. This will ensure all versioned authenticator roles can switch to those.

Then use a reverse proxy to map URI prefixes to your different postgrest instances. Now, because each postgrest instance only uses one schema, you are forcing your users to specify a schema - without headers.

I plan to use this kind of setup in a kubernetes cluster setup, too - for rolling API updates. Multiple versions / instances can live next to each other easily. And upgrading PostgREST itselt, which could be a breaking change for your API, too, would also be covered by that. You could run v1 with PostgREST 8.0 and v2 with PostgREST 9.0, etc...

Another use case is when the schemas are more lateral and used just for endpoint organization.

Yes, I'd like to do that, too. This would really work nicely with passing the schema in the URI - but I have not found a nice way to combine this with schema/user-based versioning, yet. Assume I organize my api in two schemas admin and user and want to expose those at /admin/... and /user/...... now how do I go about versioning this?

Suddenly, I need v1_admin, v1_user, v2_admin, v2_user etc... - but then, of course, I don't want to map those to /v1/v1_admin/... etc., but to /v1/admin/... (where the first part is handled by the reverse proxy).

robertsosinski commented 2 years ago

Hey everyone,

Just wanted to give an update on how I'm doing my path based schema routing, if it helps anyone if they are trying to do the same thing.

First off, I'm using Caddy (https://caddyserver.com/). I've used nginx for over a decade now and decided to give caddy a try, and really liking how to config proxy rules with a Caddyfile.

Next, I'm using the handle_path directive, which does three things, 1. allows me to filter rules based on route; 2. (which I like the most) it automatically "snips" the prefix off for me (thus I do not need to do a rewrite); and 3. they are mutually exclusive, so once one handle_path is activated, other handle_path or handle directives are bypassed.

I can also add the appropriate Accept/Content-Profile header based on the HTTP method using a matcher (begins with @). I also have compression enabled (defaults to kicking in when the response is 512 bytes or more). Finally, I get free https termination, and I can have the host specified via an env variable as {$HTTPS_HOST} in the config.

# Launch the server and use the ip address for auto-https provided to docker via an env variable.
{$HTTPS_HOST}:443 {
  # Matchers for using the correct header based on HTTP method.
  @accept_profile {
    method HEAD GET OPTIONS
  }

  @content_profile {
    method POST PATCH PUT DELETE
  }

  # Compress the response if larger than 512 bytes. Choose zstd first and use gzip as a backup.
  encode zstd gzip

  # Look for the first path name to derive the correct PostgREST schema and add the correct headers.
  # Also, add any other headers necessary for that schema as well.
  handle_path /admin/* {
    request_header Prefer params=single-object
    request_header @accept_profile Accept-Profile api_admin_v1
    request_header @content_profile Content-Profile api_admin_v1
  }

  handle_path /v1/* {
    request_header Prefer count=estimated
    request_header @accept_profile Accept-Profile api_v1
    request_header @content_profile Content-Profile api_v1
  }

  handle_path /v2/* {
    request_header @accept_profile Accept-Profile api_v2
    request_header @content_profile Content-Profile api_v2
  }

  # If no specified schema path is provided, return a 404.
  handle {
    respond 404
  }

  # Send to PostgREST unless stopped by the 404 handler above.
  reverse_proxy postgrest:3000
}

I have this setup with caddy running as a sidecar container. Requests come from the load balancer into the running task, goes first into Caddy, and then Caddy proxys into the running PostgREST container next to it. Interested in any thoughts.

steve-chavez commented 2 years ago

First off, I'm using Caddy (https://caddyserver.com/). I've used nginx for over a decade now and decided to give caddy a try, and really liking how to config proxy rules with a Caddyfile.

Nice one @robertsosinski, thanks for sharing.

I think your snippet could fit in an "integrations" page in our docs as mentioned in https://github.com/PostgREST/postgrest-docs/issues/467#issuecomment-981183434.

steve-chavez commented 1 year ago

This could be implemented with the idea on https://github.com/PostgREST/postgrest/issues/1086#issuecomment-1513910590, basically use the URL as the view or function name:

-- "/" would append the schema name
-- final URL "/api/projects"
CREATE VIEW api."/projects" AS
SELECT * FROM projects

-- "./" would not append the schema to the url, this is the default we've always had.
-- final URL "/projects"
CREATE VIEW api."./projects"
wolfgangwalther commented 1 year ago

Imho, your examples are exactly the wrong way around.

I create my views like this:

SET search_path TO api;
CREATE VIEW "/projects" ...;

Now for this (leading slash), I expect this to be available at /projects - and not at /api/projects (similar to how relative and absolute paths work in http).

I do expect the following to work the same. though:

SET search_path TO api;
CREATE VIEW "projects" ...;
CREATE VIEW "./projects" ...;

They should both be relative to my current schema, so available at /api/projects. We don't need ./projects support, though. /projects and projects is enough.

Yes, this is breaking, but consistent.


There is one more thing to think about: We should make sure to avoid any naming collisions. This means, that we should only allow a single leading / in endpoint names. Otherwise CREATE VIEW api."/api/projects" .. will quickly collide. Or CREATE VIEW "./projects" .. collides with CREATE VIEW "projects" ... etc.