Open steve-chavez opened 2 years ago
like we did for timescale.
Why is this only in the old v5.2 docs? Any specific reason for removing it from v6 onwards?
&@~ is
groonga_query
for now
I was wondering, whether we could somehow add metadata to custom operators in the database, to make PostgREST be able to identify and map them automatically. Kind of similar to what we're doing with CREATE FUNCTION ... SET ...
in #1582. Of course, SET
does not work on operators. I'm pretty sure this was discussed before and turned down for good reason, but what about using COMMENT
to add metadata to database objects?
We could use a yaml front matter block to add metadata, while keeping the current openapi comments support backwards compatible. This would also prevent issues with inlining, when using SET
on functions as mentioned in https://github.com/PostgREST/postgrest/pull/1582#issuecomment-730457425.
COMMENT ON OPERATOR "&@~" $$
---
pgrst:
filter:
name: groonga_query
requires_index: pgroonga
---
This operator will provide fulltext magic!
$$;
-- similar to the original idea in #1582
COMMENT ON FUNCTION ret_image(name text) $$
---
pgrst:
accept:
- image/png
---
This function will respond to Accept: image/png requests!
$$;
-- related to improved routing
COMMENT ON TABLE really_long_and_inaccessible_name $$
---
pgrst:
route: short_name
---
This table can now be queried at /short_name instead.
$$;
-- moving more of the config closer to database objects
COMMENT ON SCHEMA my_extensions $$
---
pgrst:
search_path: true
---
This replaces db-extra-search-path in the config.
$$;
I'm pretty sure this was discussed before and turned down for good reason, but what about using COMMENT to add metadata to database objects?
It was not thoroughly discussed IIRC. I thought it would not be a good idea because it "pollutes" the comments with postgrest-specific metadata(output of slash commands like \d+
would be weird).
Creating a DSL on top of comments, like shown on https://github.com/PostgREST/postgrest/issues/1179#issuecomment-460008785, was also described as "icky" on a hacker news comment. I somewhat agree, as it feels "invasive" and prone to errors(also no syntax highlight).
Another problem is that if other tools also (ab)use the comments then we'd "clash" with their DSL.
We could use a yaml front matter block to add metadata
yaml certainly looks better. I guess we'd have to reach consensus with other tools though.
I still think the correct way to do this would be the way mentioned in https://github.com/PostgREST/postgrest/pull/1710#issuecomment-808788905, whether is our in-db config or postgrest-contrib functions, that way our metadata would be namespaced.
Since pgroonga operators work on regular data types like text and jsonb, we can conditionally enable them on a column if it's indexed
Do you see a problem with this interface? I like the idea of being strict and requiring indexes for enabling certain filters.
I also think we'll need a more generic interface eventually, but for now we don't have too many operators for this to be a problem.
Why is this only in the old v5.2 docs? Any specific reason for removing it from v6 onwards?
It was an agreement between me and Timescale(former backer) sometime ago, where we'd show their logo + integration doc for the duration of their sponsorship on Patreon, which only lasted for a portion of v5.2 lifetime.
I think we can reincorporate the timescale doc though, it's good content and an extension section(with pgroonga and postgis later on) would look good.
&@~ is groonga_query for now
One thought, now that we have many client libraries, we can expose the original operator and leave the urlencoding to them, so we don't need to make an alias. This was discussed before on https://github.com/PostgREST/postgrest/pull/938#issuecomment-323504486:
However I don't think we should have aliases for all operators, having to remember e.g. that to use -|- I have to tell posgret adj is pointless trivia that will only trip me up; and more importantly: make it something else you have to know to use a postgrest API. If I'm reading misc http client, I'm not goint to be able to infer what adj means. The less you have to know the better.
Or, we could also offer both, the original operator and the url-safe alias.
One thought, now that we have many client libraries, we can expose the original operator and leave the urlencoding to them, so we don't need to make an alias. This was discussed before on #938 (comment):
Hm. I don't buy the argument "it's hard to remember aliases" very much. I doubt people really know all those PG operators out of their head, because the regularly use them. For me, it's more a "I know there is something that does this... let's look up the operator in the docs.". Also the client-side code is probably read by a lot more people that don't know PostgreSQL well, and the aliased operators are much more meaningful to them.
If you really want to support the original operator (where I don't really see much value in), we absolutely have to keep supporting the url-safe aliases.
It was not thoroughly discussed IIRC. I thought it would not be a good idea because it "pollutes" the comments with postgrest-specific metadata(output of slash commands like \d+ would be weird).
I guess that depends on what you're expecting to see in that case. Assuming we are reading comments of objects in the public (or other postgrest specific) schema only, it could also be helpful. The objects in the api schema should only be used for postgrest anyway, so having the postgrest specific information readily in one place would be much better than having them stored away in some big json config setting.
Of course we can try to see what kind of format we can put the metadata comments in to have them show in a nice way in \d+
etc.
Creating a DSL on top of comments, like shown on #1179 (comment) [...] Another problem is that if other tools also (ab)use the comments then we'd "clash" with their DSL.
I would assume all other tools to be somewhat resistant against stuff that doesn't match their DSL - and the same for us. As long as we are prefixing this on our side, we should be fine.
Do you see a problem with this interface? I like the idea of being strict and requiring indexes for enabling certain filters.
I think it's a great idea to be strict about requiring indexes. I like it so much, that I added requires_index: ...
to my example above, too ;)
I think we can reincorporate the timescale doc though, it's good content and an extension section(with pgroonga and postgis later on) would look good.
Fully agree.
I also think we'll need a more generic interface eventually, but for now we don't have too many operators for this to be a problem.
Looks like an extensible interface could already be useful, as new operators have now been asked for repeatedly:
@wolfgangwalther (Ab)using function names, how about this for exposing new operators
CREATE FUNCTION "pgrst.projects.name.groonga_query"(test.projects, test.projects.name%TYPE) RETURNS bool
AS $_$
SELECT $1.name &@~ $2
$_$ LANGUAGE SQL IMMUTABLE;
This would allow calling
GET /projects?name=groonga_query.stuff
I've checked and the resulting queries are inlinable too
select * from test.projects x where "pgrst.projects.name.groonga_query"(x, 2);
Related to https://github.com/PostgREST/postgrest/issues/2442#issuecomment-1224930491
Might be possible to make this more generic(for all columns) with some conventions on the function naming
I was mixing the concern about restricting the operator columns on https://github.com/PostgREST/postgrest/issues/2442, there's a more plausible idea there now.
Adding new operators could be simply:
CREATE FUNCTION "pgrst.groonga_query"(text, text) RETURNS bool
AS $_$
SELECT $1 &@~ $2
$_$ LANGUAGE SQL IMMUTABLE;
Meaning any function name with our prefix pgrst
would be exposed as a filter operator.
Ab)using function names, how about this for exposing new operators
In the end this is just a function wrapper around an operator to attach metadata to it via the name prefix. You could achieve the same by creating a custom operator with the same name prefix. Why the function wrapper?
But both of those approaches are very limited in terms of how much metadata we can attach. The example in https://github.com/PostgREST/postgrest/issues/2028#issuecomment-973832177 already mentions something like requires_index
, which wouldn't be possible this way.
I think we should sort out the "metadata" problem once and for good - and then we can build a lot of things on top of that.
You could achieve the same by creating a custom operator with the same name prefix
You can't create an operator with letters, no "eq" or "lt" https://www.postgresql.org/docs/current/sql-createoperator.html. The interface is much more complex too.
The example in https://github.com/PostgREST/postgrest/issues/2028#issuecomment-973832177 already mentions something like requires_index, which wouldn't be possible this way.
Yeah, that I think we'll not need anymore. As discussed on https://github.com/PostgREST/postgrest/issues/2249, it's not certain that an index will be applied. What we should do instead:
You could achieve the same by creating a custom operator with the same name prefix
You can't create an operator with letters, no "eq" or "lt" https://www.postgresql.org/docs/current/sql-createoperator.html. The interface is much more complex too.
The example in #2028 (comment) already mentions something like requires_index, which wouldn't be possible this way.
Yeah, that I think we'll not need anymore.
To put it the other way around: We do need to solve the "metadata for database objects" problems. Once we do that, adding the information on which operators to expose through this metadata is much nicer than through function names as proposed here.
So no matter whether we need requires_index
right now or not, or whether the operator prefix would in fact need to be a schema name... - let's solve the metadata stuff first and then a lot of pieces will fall into place here and in other issues.
Now that the allowlist on https://github.com/PostgREST/postgrest/issues/2442#issuecomment-1289520787 seems to be the way to go to restrict operators on columns, I think this can be kept simpler.
We could just have a config that maps keywords to operators:
db-extra-operators = "groonga_query:&@~, similarity:%, geo_within:st_within"
ALTER ROLE authenticator SET pgrst.extra_operators = 'groonga_query:&@~, similarity:%, geo_within:st_within'
Later one we might have a COMMENT ON or SECURITY LABEL for metadata but for now this looks like the simplest way to add operators.
ALTER ROLE authenticator SET pgrst.extra_operators = 'groonga_query:&@~, similarity:%, geo_within:st_within'
A problem with the above proposal is that it cannot express the FTS operators, unless we support something like:
ALTER ROLE postgrest_test_authenticator
SET pgrst.extra_operators = 'my_fts:@@ to_tsquery, my_plain_fts: @@ plainto_tsquery'
Which is not that clean or expressive IMO.
The above function interface is much more expressive. To make it cleaner, how about removing the naming convention:
CREATE FUNCTION groonga_query(text, text) RETURNS bool
AS $_$
SELECT $1 &@~ $2
$_$ LANGUAGE SQL IMMUTABLE;
So we expose a function as an operator if:
db-schema
/rpc
as well, doesn't seem like a problem. Or we could try to hide it from rpc.db-extra-search-path
because the extensions/public
schemas would be here and we would expose many functions as operatorsbool
The function interface is complex, I'm not convinced by it.
I'm in need of the pgroonga operators though.
Until we settle on an interface, how about just adding the db-extra-operators
mentioned above. It's not complex at all to add that config and could be removed later if we have a better inteface. This reminds me of raw-media-types
, which is not ideal but it has been really low maintenance since it required few lines of code.
Regarding aliasing to avoid urlencoding on operator syntax, maybe that's something that should be done in core. It doesn't feel right to let customers break their APIs by changing an alias on the config. I feel that design is brittle.
Could we be smart and transform symbols to letters? Like @=at
, then @@
would be col=atat.txt
.. that doesn't look good, so no.
Another option could be having a special raw
operator:
col=raw(@@).content
That does look more consistent with our syntax. Plus aliases doesn't seem necessary now.
To allowlist the exposed operators:
raw-operators = "&&, @@"
I think this can also keep the QueryParams
parsers a bit more simple. Disallowed operators can be rejected at the Plan
level.
The syntax for the raw operators could be:
SEARCH /table?column=$op.@@.$body.val
{"..."}
Related to the dollar operators proposal https://github.com/PostgREST/postgrest/issues/2125#issuecomment-1376533758
(Kinda looks like the OPERATOR(schema.@@)
syntax in PostgreSQL.)
Could also be
SEARCH /table?column=$op(@@).$body.val
{"..."}
I've realized this is somewhat the same problem as https://github.com/PostgREST/postgrest/issues/2578. We also need a way to call filters with multiple inputs.
Could also be
SEARCH /table?column=$op(@@).$body.val {"..."}
I like this the most of the suggestions above.
How about this for custom operators.
Since IMMUTABLE is really a pure function, we could expose these as operators.
These can be enabled per table/view and column like this:
-- name is a type for a pg identifier
CREATE FUNCTION groonga_query(test.projects, text, text, name default 'id,name,client_id') RETURNS bool
AS $_$
SELECT $2 &@~ $3
$_$ LANGUAGE SQL IMMUTABLE;
They can be enabled per table/view and type like this:
CREATE FUNCTION groonga_query(test.projects, text, text) RETURNS bool
AS $_$
SELECT $2 &@~ $3
$_$ LANGUAGE SQL IMMUTABLE;
They can be enabled for all table/view and type like this:
CREATE FUNCTION groonga_query(anyelement, text, text) RETURNS bool
AS $_$
SELECT $2 &@~ $3
$_$ LANGUAGE SQL IMMUTABLE;
Inlining advantages apply as mentioned above.
We would need to scan the filters and search them in the schema cache for this, which would incur in perf loss.
Or we could somehow mark them as custom, like:
GET /projects?name=$groonga_query.stuff
Come to think of it, the above might solve https://github.com/PostgREST/postgrest/issues/2442 as well.
We could create:
CREATE FUNCTION eq(anyelement, anyelement) RETURNS bool
AS $_$
SELECT $1 = $2
$_$ LANGUAGE SQL IMMUTABLE;
Which would represent our default eq
filter.
To disallow it globally:
REVOKE EXECUTE ON eq FROM anon;
REVOKE EXECUTE ON eq FROM webuser;
Then the user can enable it per column or per table as mentioned above.
It might leave a big footprint in the db but I think it could work.
Using unnamed parameters for custom operator functions is key to differentiate them from https://github.com/PostgREST/postgrest/issues/2578
These can be enabled per table/view and column like this:
-- name is a type for a pg identifier CREATE FUNCTION groonga_query(test.projects, text, text, name default 'id,name,clientid') RETURNS bool AS $$ SELECT $2 &@~ $3 $_$ LANGUAGE SQL IMMUTABLE;
An improvement on the above would be:
CREATE FUNCTION groonga_query(test.projects, text, text, id name) RETURNS bool
AS $_$
SELECT $2 &@~ $3
$_$ LANGUAGE SQL IMMUTABLE;
Where the named parameter at the end indicates the column for the operator.
So far looks like https://github.com/PostgREST/postgrest/issues/2805#issuecomment-1575705654 is the simplest implementation for custom operators.
@wolfgangwalther I was wondering if you have any concerns about that?
I guess one disadvantage is that we lose type checking when adding new operators through a custom config. It will also be harder to infer correct typescript types later.
Hm, perhaps we leave the configurable query grammar just for restricting operators and not adding them?
One alternative now that we have the pre-config
and hence the postgrest
schema(docs). Is that we could just add immutable functions to the postgrest
schema:
CREATE FUNCTION postgrest.eq(anyelement, anyelement) RETURNS bool
AS $_$
SELECT $1 = $2
$_$ LANGUAGE SQL IMMUTABLE;
CREATE FUNCTION postgrest.fts(anyelement, anyelement) RETURNS bool
AS $_$
SELECT $1 @@ to_tsquery($2)
$_$ LANGUAGE SQL IMMUTABLE;
-- later on my thinking is that we could support
CREATE FUNCTION postgrest.fts(text, anyelement) RETURNS bool
AS $_$
SELECT $1 @@ to_tsquery($2)
$_$ LANGUAGE SQL IMMUTABLE;
-- that's basically saying that clients can only use the `fts` operators on text columns
And then we enable them as operators for the query grammar. That doesn't sound bad actually.
So far looks like https://github.com/PostgREST/postgrest/issues/2805#issuecomment-1575705654 is the simplest implementation for custom operators.
@wolfgangwalther I was wondering if you have any concerns about that?
I don't like it. We should not do stuff that should be done via database objects, via config.
PostgREST's goal is to "expose a, potentially already existing, database schema". So the first thing to do should be do interpret the existing schema and expose it in a way that makes sense.
You showed multiple examples of how we could use IMMUTABLE
functions to map operators. Maybe the right question to ask is this: Given the following function in the exposed schema - is there any way to interpret it's "meaning" other than as an operator?
create function eq(text, text) returns bool
immutable
return $1 = $2;
We can't use it as an RPC, because we don't support multiple unnamed arguments for those. It's returning a bool, which is a strong indication, that it's supposed to be used in a WHERE
clause.
Our query/request syntax has the following "identifiers" it can use:
select=
None of those are applicable to this here. Short of some constants and keywords for syntax, we only have operators left in our query syntax. So it's pretty natural to use this.
One alternative now that we have the pre-config and hence the postgrestschema(docs).
I don't see how "the postgrest
schema" is a thing there. I could put the pre-config (same as pre-request) function in any schema - but there is not a separate config setting for the schema those functions are in. So we certainly can't use this schema for anything else.
Also: We want to expose those operators.. so they should be in the exposed schema, right?
Hm, perhaps we leave the configurable query grammar just for restricting operators and not adding them?
I like the suggestion you made above more:
An improvement on the above would be:
CREATE FUNCTION groonga_query(test.projects, text, text, id name) RETURNS bool AS $_$ SELECT $2 &@~ $3 $_$ LANGUAGE SQL IMMUTABLE;
Where the named parameter at the end indicates the column for the operator.
I don't like that either. That's a lot of magic.
I didn't have the use-case to disable some operators (or other things) for specific routes (or enable them only for one etc.). If we do anything remotely like that, we should make it some kind of general solution that would work with much more than just this specific case.
Operators should not depend on context. They should only depend on their two arguments.
Agree overall. If anything adding new operators via functions looks like the way forward.
However this part:
Restrictions can then be made via permissions - the most natural way to do so.
Still doesn't solve enabling operators per column, say only enabling eq
for PKs. With https://github.com/PostgREST/postgrest/issues/2805 that's doable.
But we can leave that for later. I'm not sure if #2805 is the best way.
Seems implementing this is not trivial. Right now we validate operators on QueyParams.hs
, which doesn't involve the schema cache. We need to move operator parsing to the Plan.hs
level.
Implementation is kinda similar to https://github.com/PostgREST/postgrest/pull/2825 in that we have builtins and we can override/enhance them.
When implementing this we might also consider supporting using the operator on the left side of the query param https://github.com/PostgREST/postgrest/issues/2066
You showed multiple examples of how we could use IMMUTABLE functions to map operators. Maybe the right question to ask is this: Given the following function in the exposed schema - is there any way to interpret it's "meaning" other than as an operator?
create function eq(text, text) returns bool immutable return $1 = $2;
We can't use it as an RPC, because we don't support multiple unnamed arguments for those. It's returning a bool, which is a strong indication, that it's supposed to be used in a WHERE clause.
Looks like the above holds up pretty good. I've been testing:
WITH
x AS (
SELECT n.nspname as "Schema",
p.proname as "Name",
pg_catalog.pg_get_function_result(p.oid) as "Result data type",
pg_catalog.pg_get_function_arguments(p.oid) as "Argument data types",
CASE p.prokind
WHEN 'a' THEN 'agg'
WHEN 'w' THEN 'window'
WHEN 'p' THEN 'proc'
ELSE 'func'
END as "Type",
CASE
WHEN p.provolatile = 'i' THEN 'immutable'
WHEN p.provolatile = 's' THEN 'stable'
WHEN p.provolatile = 'v' THEN 'volatile'
END as "Volatility",
CASE
WHEN p.proparallel = 'r' THEN 'restricted'
WHEN p.proparallel = 's' THEN 'safe'
WHEN p.proparallel = 'u' THEN 'unsafe'
END as "Parallel",
pg_catalog.pg_get_userbyid(p.proowner) as "Owner",
CASE WHEN prosecdef THEN 'definer' ELSE 'invoker' END AS "Security",
pg_catalog.array_to_string(p.proacl, E'\n') AS "Access privileges",
l.lanname as "Language",
COALESCE(pg_catalog.pg_get_function_sqlbody(p.oid), p.prosrc) as "Source code",
pg_catalog.obj_description(p.oid, 'pg_proc') as "Description"
FROM pg_catalog.pg_proc p
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace
LEFT JOIN pg_catalog.pg_language l ON l.oid = p.prolang
WHERE pg_catalog.pg_function_is_visible(p.oid)
ORDER BY 1, 2, 4)
SELECT * FROM x
WHERE "Description" ILIKE '%operator%' AND "Result data type" = 'boolean' AND "Volatility" = 'immutable';
-[ RECORD 1 ]-------+---------------------------------------------------------
Schema | pg_catalog
Name | aclitemeq
Result data type | boolean
Argument data types | aclitem, aclitem
Type | func
Volatility | immutable
Parallel | safe
Owner | postgres
Security | invoker
Access privileges |
Language | internal
Source code | aclitem_eq
Description | implementation of = operator
-[ RECORD 2 ]-------+---------------------------------------------------------
Schema | pg_catalog
Name | array_eq
Result data type | boolean
Argument data types | anyarray, anyarray
Type | func
Volatility | immutable
Parallel | safe
Owner | postgres
Security | invoker
Access privileges |
Language | internal
Source code | array_eq
Description | implementation of = operator
-[ RECORD 3 ]-------+---------------------------------------------------------
Schema | pg_catalog
Name | array_ge
Result data type | boolean
Argument data types | anyarray, anyarray
Type | func
Volatility | immutable
Parallel | safe
Owner | postgres
Security | invoker
Access privileges |
Language | internal
Source code | array_ge
Description | implementation of >= operator
...
And every operator we expose is inside that list. Some like below
, above
, might be useful to expose too.
The main problem is the naming, most functions contain a good name but it's usually concatenated with the types.
If we could get a good name from the internal functions then we wouldn't need to hardcode the operators in the code nor we would need to instruct the user to create new functions.
Considering the that it's not easy to add support for custom operators (ref). I'm tempted to just hardcode the pgroonga operators for now.
They recently released a how-to for postgREST that uses dynamic SQL which doesn't look good at all: https://pgroonga.github.io/ja/how-to/postgrest.html
pgroonga is on nixpkgs too, which should make things simpler.
Idea: Thinking more about restricting operators and set returning functions. Aren't function parameters filters that only can be eq
?
CREATE FUNCTION getproject(id int) RETURNS SETOF projects AS $$
SELECT * FROM test.projects WHERE id = $1;
$$ LANGUAGE sql;
GET /rpc/getproject?id=3
Using a different name for the function and assuming we can have their path at the root (related to https://github.com/PostgREST/postgrest/issues/1086)
CREATE FUNCTION projects(id int) RETURNS SETOF projects AS $$
SELECT * FROM test.projects WHERE id = $1;
$$ LANGUAGE sql;
GET /projects?id=eq.3
Of course right now we support all operators on the returned set but maybe we could have a mode where only the parameters can be used as filters? Then restricting operators would be a matter of shielding views/tables behind functions.
To expose different operators, maybe we could have a convention with DOMAINs:
CREATE DOMAIN int_lt as int;
CREATE FUNCTION projects(id int_lt) RETURNS SETOF projects
LANGUAGE sql
AS $_$
SELECT * FROM test.projects WHERE id < $1;
$_$;
Then it would be possible to:
GET /projects?id=lt.3
Then restricting operators would be a matter of shielding views/tables behind functions.
This is still fuzzy but I like the above idea.
How about this, adding unused parameters to the function, which have the following signature..
CREATE DOMAIN pgrst_dynamic_op AS TEXT; -- maybe they can be `anyelement`, but this domain seems clearer
CREATE OR REPLACE FUNCTION projects("eq.id" pgrst_dynamic_op, "lt.id" pgrst_dynamic_op)
RETURNS SETOF projects
AS $$
SELECT * FROM test.projects;
$$ LANGUAGE sql;
Allows us to restrict which operators we can add dynamically..
SELECT * FROM projects().. WHERE id = $1 AND id < $2
Through the URL:
GET /projects?id=eq.3&id=lt.10
@wolfgangwalther WDYT?
I think this could even be used for restricting order
(also requested on https://github.com/PostgREST/postgrest/issues/2442), like:
CREATE DOMAIN pgrst_dynamic_order AS name;
CREATE OR REPLACE FUNCTION projects(.., id pgrst_dynamic_order, name pgrst_dynamic_order)
-- or perhaps
CREATE OR REPLACE FUNCTION projects(.., "id,name" pgrst_dynamic_order)
PGroonga offers performance and usability advantages over pg native textsearch and pg_trgm.
Tutorial: https://pgroonga.github.io/tutorial/
Reference : https://pgroonga.github.io/reference/
Proposal
Since pgroonga operators work on regular data types like
text
andjsonb
, we can conditionally enable them on a column if it's indexed. This is to prevent expensive searches on unindexed columns.Then to do
SELECT * FROM table WHERE column &@~ 'PostgreSQL';
:(&@~ is
groonga_query
for now).Then we can document this in an integrations(or extensions) page like we did for timescale.