Open wolfgangwalther opened 2 years ago
I like many of the ideas from an HTTP standpoint, but they'd be such a big breaking change(consider all client libraries we have now) that we might as well change our name if we did them. Many users will wonder why they should not just try to migrate to a GraphQL/OData(which are behind standards) solution if we do such a change.
or just invent a query language v2
Yes, this is why it's important to document our REST syntax(ref), maybe we could do a RESTQL/RSQL/RESQL V2 if we'd ever adopt such a change. But even if we were to support both our v1 and v2 at the same time, I think that would still create doubts about our stability.
So I think before doing any major breaking changes to our REST syntax, we should offer extensibility(https://github.com/PostgREST/postgrest/issues/1909#issuecomment-952375549) to provide a way to be compatible with other standards(GraphQL/Odata).
Some comments:
GET /people?uid.eq=123 Accept: application/json;select=*
This looks better from a REST/HTTP standpoint, but there are many crippled http clients out there that can't send headers, we would leave those out or require even more use cases for proxies.
GET /people,jobs;jobs_fk?age.eq=15&jobs.title.eq=Boss
Ah, isn't ;
required to be urlencoded? I'm not sure, but it seems too likely that we'd be creating impedance with http clients out there, causing similar problems to the ones our current or/and
operators generated. Also, IMHO ;
somehow looks bad in the URL path.
GET /people?exists=jobs!job_fk
This one looks good and is actionable, will take it to a new issue.
Ah, isn't
;
required to be urlencoded? I'm not sure, but it seems too likely that we'd be creating impedance with http clients out there, causing similar problems to the ones our currentor/and
operators generated. Also, IMHO;
somehow looks bad in the URL path.
No, they are specifically allowed for that purpose: https://datatracker.ietf.org/doc/html/rfc3986#section-3.3
Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. Parameter types may be defined by scheme-specific semantics, but in most cases the syntax of a parameter is specific to the implementation of the URI's dereferencing algorithm.
I like many of the ideas from an HTTP standpoint, but they'd be such a big breaking change(consider all client libraries we have now) that we might as well change our name if we did them. Many users will wonder why they should not just try to migrate to a GraphQL/OData(which are behind standards) solution if we do such a change.
I certainly don't want to propose a general breaking change in the API here. I think there is value in "providing the most HTTP-standard compliant interface" in itself, but also many of those proposals solve actual problems we have.
Yes, this is why it's important to document our REST syntax(ref), maybe we could do a RESTQL/RSQL/RESQL V2 if we'd ever adopt such a change. But even if we were to support both our v1 and v2 at the same time, I think that would still create doubts about our stability.
So I think before doing any major breaking changes to our REST syntax, we should offer extensibility(#1909 (comment)) to provide a way to be compatible with other standards(GraphQL/Odata).
If we offer an extensible interface for the query language and have both our V1 and V2 to make use of that interface / prove that it's working well - I don't see us creating doubts about stability. It's the other way around, I think. It increases confidence in a stable non-breaking interface, because new features, that would imply breaking changes can much easier be developed while keeping backwards compatibility in other versions of our language.
Accept: application/json;select=lastname,firstname
Now that we're closer to having XML support for tables, the select
above seems redundant for each mimetype.
I was wondering if there's a more standard way for embedded resources. I've been looking at HAL, but its verbosity makes me think it's not adequate for big result sets. Also users would need HAL-specific clients to really use the data.
Following the line of thought of changing the path syntax and that embedding resources have a different type, I thought of representing the result of an embed as a different resource - GET /usersEmbedClients
- but that of course wouldn't work for many embeds and tree-structured data.
An embed query parameter looks more "standard", if we'd adopt it I guess we could do:
GET /projects?select=id,name,clients(name,tasks(name))
As
GET /projects?embed=clients&clients.embed=tasks&select=id,name&clients.select=name&clients.tasks.select=name
Which has the disavantage of being more verbose and more importantly it would lose the ability of having an order for the fields in the resulting body(since proxies don't guarantee the order of the query parameters IIRC).
With the http QUERY method, IMO the http working group likely has given up on representing embedded data cleanly within the URL syntax restrictions(there's even SQLish examples in the spec). With that, it feels futile trying to improve our embedded resource syntax.
I do agree that changing the operator syntax would be a win though.
Once https://github.com/PostgREST/postgrest/issues/1970#issuecomment-1382613004 is implemented, I think we could support both the old and new filter syntax without a breaking change.
Change operator syntax from col=op.val to col.op=val
Looking at the parsers I think it'd be easier to do op.col=val
than col.op=val
.
col=val
would be eq.col=val
implicitly.
Some ideas..
Embedding other resources really changes the entity we are requesting, too - so they should not be moved to the header. What about using the flexibility the path specification of a URI gives us for embedding? GET /people,jobs GET /people,jobs;jobs_fk?age.eq=15&jobs.title.eq=Boss Accept: application/json;select=name,age,jobs.title
Doesn't embedding just add attributes to the entity? You still do a FROM base_entity JOIN other
on SQL after all. We could just say an embedding is a different representation of a resource(table).
Edit: Totally misread the RFC. It was related to media type parameters not query parameters. Still the new media type below might be worth considering.
Also RFC 2046 says:
Parameters are modifiers of the media subtype, and as such do not fundamentally affect the nature of the content. The set of meaningful parameters depends on the media type and subtype.
So the query parameters depend on the media type, we don't need to put the select
on the header. Also, omitting columns from select
doesn't fundamentally change the nature of the content - it's still json, we're just leaving some fields as 'undefined'.
Considering the above, it sounds possible to document the embedding possibilities for a resource. In OpenAPI we can list the Accept media types. Using films
in resource embedding, we could say it accepts:
paths:
/films:
get:
...
content:
application/vnd.pgrst.related+json; rel=films>-directors
application/vnd.pgrst.related+json; rel=films-<nominations
application/vnd.pgrst.related+json; rel=films-<roles
application/vnd.pgrst.related+json; rel=films>-<actors
application/vnd.pgrst.related+json; rel=films>-<competitions
>-
= many-to-one. -<
= one-to-many. >-<
many-to-many. one-to-one could be --
.
So say we get the following request(with nesting):
GET /films?select=*,roles(*, actors(*)),directors(*)
We would respond with our usual json, but with the added header:
Content-Type: application/vnd.pgrst.related+json; rel=films-<roles,films>-directors,roles>-actors
The gain here is that we could document the relationships a resource has. Which was previously impossible in OpenAPI.
Considering the above, it sounds possible to document the embedding possibilities for a resource. In OpenAPI we can list the Accept media types. Using films in resource embedding, we could say it accepts:
This would only make sense if the embedding were to take place via accept header / mediatype. I don't think that would be correct. Adding an embedding will return a new entity - it will not merely change the return format. That means it does not belong in the header.
The gain here is that we could document the relationships a resource has. Which was previously impossible in OpenAPI.
Really what I suggested above about embeddings and the path component would be the way forward. Just requesting people is different from just requesting jobs is different from requesting a combination of both. If we had something like GET /people,jobs
, then it would also be possible to add those new paths to the OpenAPI output.
Change operator syntax from
col=op.val
tocol.op=val
As another datapoint, this also helps when user input is passed to a stored procedure via GET
.
Currently, query=foobar
is processed correctly for CREATE FUNCTION search(query TEXT)
. But with query=in.foobar
, Postgrest no longer interprets query
as a function param, but as a filter.
There doesn't seem to be a simple way to force the function param interpretation (quoting required preprocessing in the stored procedure to remove the quotes).
Adding an embedding will return a new entity - it will not merely change the return format. That means it does not belong in the header.
Just noticed that even the Prefer: exact=count
hints that the above is true. The count is different when embedding is done (see https://github.com/PostgREST/postgrest/issues/2009 and https://github.com/PostgREST/postgrest/issues/2846). So indeed it's a new entity and doesn't belong in the header, makes sense because people,jobs
can be represented in json
, csv
or other media types.
GET /people,jobs;jobs_fk?age.eq=15&jobs.title.eq=Boss
This would also play nicely with the ;single or ;one parameter for requesting a single entity, which is kind of similar to a hint of relationship type to embedd !m2m, !o2m, etc.: GET /people;one,jobs;o2m?...
Now that we're moving on from doing disambiguation on the URL(https://github.com/PostgREST/postgrest/issues/2863), the syntax can be much simpler so we don't need the FK or the cardinality in the path. So we could just go with GET /people,jobs
.
There's a problem in expressing nested embedding on the path, since it needs to shape the JSON output. For this I believe we should keep the select
parameter.
GET /actors,roles,films?select=roles(character,films(title,year))
Or later as a media type parameter too:
GET /actors,roles,films
Accept: application/json;select=roles(character,films(title,year))
If we had something like GET /people,jobs, then it would also be possible to add those new paths to the OpenAPI output.
Yes, and we could do this now without a breaking change.
Another syntax:
GET /users(projects,tasks(subtasks))
That would allow us to express joins on resources.
Adding an embedding will return a new entity - it will not merely change the return format.
If doing an embedding (a foreign key join) creates a new entity, what about an aggregate like SUM, AVG, etc? I mean it has the potential to reduce a REST collection to a single resource.
I'm mostly trying to see if we could do away without adding GROUP BY.
Been checking project m36 aggregates and they say:
SQL supports aggregate queries using aggregate functions (SUM(), COUNT(), AVERAGE(), etc.) and GROUP BY while the aggregate functions in the relational algebra simply accept relation values as arguments.
But I haven't had the chance to play it with yet (their nix-shell
took ages), so not sure if they don't really need GROUP BY.
PRQL has a good approach for this (ref): aggregates don't change the number of rows by default but only add a new column using window functions:
https://prql-lang.org/book/reference/stdlib/transforms/aggregate.html#aggregate-is-required
This will prevent failure from clients whenever they forget to add GROUP BY.
So I think the above answers the previous question:
If doing an embedding (a foreign key join) creates a new entity, what about an aggregate like SUM, AVG, etc? I mean it has the potential to reduce a REST collection to a single resource.
An aggregate doesn't have to change the amount of rows. We can say that GROUP BY is a filter in a way.
Another improvement for the query language: use ~
for not
. It's shorter/clearer (likely easier for the parsers) and it doesn't require urlencoding: https://perishablepress.com/stop-using-unsafe-characters-in-urls/
/a?col.~eq=3
We have discussed a few shortcomings of our current query language (expressions in the query string). Taking those suggestions together - and adding a few more points, that seem to be inconsistent to me in the current implementation.
I wonder whether we should try to implement all those improvements as non-breaking separated changes - or just invent a query language v2, where we can chose via config option which language to use. Maybe #1804 can lead us to have a cleanly separated
QueryLanguage
module that we can replace with a new module.Here we go:
Change operator syntax from
col=op.val
tocol.op=val
: #1970. This will allow to useswagger-ui
to send requests. It will also allow plain-html requests without javascript like this:This was not possible before, because the value would have to be constructed dynamically.
Separating filter logic from values: #1970. This will allow doing more complex queries in
swagger-ui
involving logic conditions, too. It will also simplify a lot of our qouting requirements already. Let's put all logic into alogic=...
parameter. Maybefilters=
is even better.Replace current
in
withcol.eq.any=<array>
as suggested in #1943. I'd go one step further and removein
completely. We are usingANY
under the hood anyway and together withlogic=
, we would remove all manual quoting we'd need to do. We'd rely only on PG array quoting.Move
select
andorder
to parameters of the mimetype in theAccept
header. Both parameters only shape the response - but the underlying entity is the same. As such, I think they don't really fit in the query string, which should describe which entity to receive. This would also remove the strange interaction we have between theselect=
and theAccept
header already: When you request a custom mimetype, e.g.application/octet-stream
ortext/plain
, the wholeselect=
doesn't make any sense. So the availability ofselect
is already dependent on the mediatype. Example: Both of the following requests will return the same person. Returning fewer or more columns is not much different to just returning the same entity in a different format.would become
I'm not sure whether we can still use the mimetype
application/json
here. The registration mentions no parameters at all: https://www.iana.org/assignments/media-types/application/json. It specifically hints thatcharset
is not a valid parameter, but we are returning it anyway. At the same time RFC6838 says:As we would still be returning valid json, I think we would not change existing functionality, so adding parameters should be fine. If we end up not wanting to do this, we could still do so with our own custom media type, though.
Remove
limit
andoffset
parameters entirely - this can be done via the respective range headers.Replace
!inner
on the embedding with anexists
filter as suggested in https://github.com/PostgREST/postgrest/issues/2009#issuecomment-981202986. Note, this does not really play well with moving the embedding to theAccept
header, yet. I don't think referencing a definition that is sent in the header from the query string is a wise idea. But given how the SQL for an exists query actually looks like (EXISTS (... sub query...)
), maybe we can do something likeexists=<embedding-spec>
:This means that in some cases the embedding + hint needs to be repeated between query string and header, but this makes thing much more explicit. Later on, we could even support true filtering via
EXISTS (...)
, which could then be combined with other filters in thelogic=
part. And only if it's part of a top-levelAND
and requested in theselect=
part, then we would make it aINNER JOIN
.Move
columns
to a parameter of the mimetype in theContent-Type
header, which describes the request body. Same arguments apply as above.Move
on_conflict
to a parameter of thePrefer: resolution
header. POSTing the same request once withon_conflict
and once without really does not change the returned entity. Parameters on prefer headers can be given like this, according to the RFC:Change the
Accept: application/vnd.pgrst.object+json
way to request a single resource to a path parameter something like this:This is because, the ability to query for a single row instead of multiple rows is really not inherent to the media type requested. Right now, it's not possible to query for a single row in
text/csv
,application/octet-stream
,text/plain
and other formats, which is a severe limitation. Changing it like this would also allow to people to rewrite those PK-equality requests easily via nginx from something like/people/123
. And since there is no interaction with the mimetype, this will not conflict with custom accept headers that might be sent.This would leave only filters (identified by at least one
.
before the=
), arguments (without.
) andlogic=
query parameters, reducing the amount of "reserved keywords" by a fair bit.Using parameters to all kinds of things etc. is also much more consistent with what the RFCs allow for those, instead of cramping everything into the query string.
Not sure whether everything in here can really be done. I haven't found a good solution for filters / limits / offsets on embedded resources, yet. It doesn't make much sense to move those to the accept header - but keeping them in the query string would create a strange interaction between query string and accept header once again.
One idea, that I haven't fully thought through, yet, could be to split the embeddings from the
select
part. If the embeddings were defined in the query string, but theselect=
in the accept header would reference those, that would be fine, I think. Having an interaction in this direction between header and query string would make sense, imho.Embedding other resources really changes the entity we are requesting, too - so they should not be moved to the header. What about using the flexibility the path specification of a URI gives us for embedding?
This would also play nicely with the
;single
or;one
parameter for requesting a single entity, which is kind of similar to a hint of relationship type to embedd!m2m
,!o2m
, etc.:They both do kind of the same thing in returning an object instead of an array.