rpc: specify argument types (GET requests on RPCs)

LeKovr commented 9 years ago

There is incorrect SQL generated when stored procedure has argument of array type:

CREATE OR REPLACE FUNCTION "1".test(q INTEGER[]) RETURNS SETOF INTEGER LANGUAGE 'sql' AS $_$
      SELECT unnest($1)
$_$;

Call:

curl -d '{"q":[1,2,3]}' http://localhost/rpc/test

Generated SQL (incorrect):

select * from "1".test(q:='[1,2,3]'::unknown)
-- ERROR:  missing "]" in array dimensions

Correct SQL:

select * from "1".test_text3(q:='{1,2,3}'::unknown);
-- or
select * from "1".test_text3(q:=ARRAY[1,2,3]);

Also, when argument has type TEXT[] we got another error (missing dimension value) but both of them might be solved by replacing [] with {}.

begriffs commented 9 years ago

Thanks for reporting! I'll look into it.

begriffs commented 9 years ago

Appears related to #282

LeKovr commented 9 years ago

Yes, string arg from #282 is working:

curl -d '{"q":"{1,2,3}"}' http://localhost/rpc/test

but real array support would be better than this string join/split method

begriffs commented 8 years ago

There may be an ambiguity in this problem. There can be two stored procedures with the same name but taking arguments of different types. What if one of them takes a postgres array and the other takes a json value? I don't think we could tell which is the right one to execute.

We could examine the functions and their argument types like this

SELECT ns.nspname, p.proname, oidvectortypes(p.proargtypes)
FROM pg_proc p
INNER JOIN pg_namespace ns 
ON (p.pronamespace = ns.oid)
WHERE ns.nspname =  'public';

Then if the function in question is not ambiguous we'll know whether to format as array or json. Might just be easier to make all your functions accept a json values rather than arrays. What do you think?

LeKovr commented 8 years ago

What if one of them takes a postgres array and the other takes a json value?

This is the same problem as with int/text args or even text/json. What is the general way to solve it?

Might just be easier to make all your functions accept a json values rather than arrays

Will it be any json supported or just json arrays?

begriffs commented 8 years ago

You're right, I hadn't thought of that. That's probably the fundamental solution -- if we allow people to specify the types of each argument in general then we'll have solved the sql-vs-json array problem.

We could add query params to an RPC post request to specify types, something like

POST /rpc/myproc?arg1=json&arg2=array&arg3=integer

Any type left unspecified by the query params would be cast as ::unknown like it is already.

LeKovr commented 8 years ago

We could add query params to an RPC post request to specify types, something like

Let's think about example like

/rpc/myproc?arg1=integer&arg2=text&arg3=decimal

Is it have sense in www world? Are we keeping things simple?

BTW, this topic was named "array type args support" because it is about simplicity of html form submitting i.e. for requests like

/rpc/myproc?arg1=a&arg1=b&arg1=c

begriffs commented 8 years ago

I thought of a few complications.

In the future we will want to support filtering through query params. When stored procs return a list of information we should allow selecting, filtering and ordering like the other endpoints (docs). Thus the names of the columns will already be used as params.
GET requests are inappropriate for the semantics of calling a stored proc, hence sending function arguments via html forms as query parameters is moot.

Other possibilities for sending argument type information

Decorate the name of the endpoint like rpc/myproc[foo:int,bar:string]. One thing I like about it is that the argument types are actually part of the full name of a proc, so it's accurate to include the parts of the name in its resource name. I don't like that it inaccurately suggests the order of the arguments matters.
Add an extra query param like /rpc/myproc?argtypes=foo:int,bar:string. It's nice that it is unlikely to conflict with a column name. It's weird that all the other query params refer to the output of the proc, and this one refers to its input.
Add an additional field in the payload, like { "_types": { "foo": "int", "bar": "string" } }. I don't like that the payload helps actually determine the proc.

calebmer commented 8 years ago

@begriffs a GET request is completely valid for functions which generate data as long as A) it's safe (no side effects) and B) it's idempotent (returns same/similar data each time). I believe actions such as getting a token should be a GET request as it satisfies the above requirements and GET makes more semantic sense then POST (whose name is derived from posting to a message board).

Second, I don't think we should worry about filtering rpc results. If a developer wants to filter the result, they can just a function and a view. I think we should adopt the syntax earlier proposed therefore (/rpc/thing?arg1=value1). It also matches our syntax for filtering maintaining the uniform interface REST constraint.

begriffs commented 8 years ago

True, if we can detect the volatility of each function we can support GET for the immutable ones.

The reason I brought up filtering is that rpc is the only way to return certain result sets, such as the query mentioned in #167, so people may want to use procs to generate results.

I think we should adopt the syntax earlier proposed therefore (/rpc/thing?arg1=value1). It also matches our syntax for filtering maintaining the uniform interface REST constraint.

Can you explain more about this, and also how it solves the issue of specifying the types of the arguments?

calebmer commented 8 years ago

@begriffs sorry, I wasn't correctly informed on what the earlier example meant before posting what I thought to be an equivalent example.

I don't think specifying the type fixes the actual big here. There is a bug in PostgREST's implementation that we should fix. Specifying types should be a secondary priority. I don't think any of the proposal are good/necessary.

However, if we really need a solution, I'm +1 to JSON meta schemas as we discuss potentially adopting that format for our other schemas. Here's how it would work:

{
  "$schema": {
    "type": "object",
    "properties": {
      "array": {
        "type": "array"
      }
    }
  },
  "array": [1, 2, 3]
}

A little verbose yeah, but this feature should never really have to be used except by unique APIs. Also don't think of it is choosing a procedure with the request body, think of the Postgres functions with different types as Haskell pattern matches. The function is already chosen by the URI, then the database must select how to handle by looking at types.

diogob commented 8 years ago

At least ordering results from a function can be quite useful. It seems to me that using GET to call functions implies a lot of code for a small benefit. Besides, the POST using JSON can give us some info about types of parameters, I fail to see the benefits of using a query string in this case.

Regarding the original example posted by @LeKovr the easiest solution to me seem to be redefining the function:

CREATE OR REPLACE FUNCTION "1".test(q json) RETURNS SETOF INTEGER LANGUAGE 'sql' AS $_$
      SELECT unnest($1)
$_$;

calebmer commented 8 years ago

Actually, @diogob, wouldn't it be easy to detect safeness/idempotent even by checking if the function is an SRF? Or maybe defined in the SQL language?

diogob commented 8 years ago

@calebmer SQL language functions can have side-effects, and also can depend on time and external sources of data, same goes for non-SRF. The safest bet would be to rely on the cache strategy attribute of the function, as @begriffs pointed out (using only immutable functions in GET).

The biggest problem IMO is that adding special cases (of any kind) adds complexity. But I understand how the GET caching strategy can be a big benefit.

I wonder if it wouldn't be more practical to come up with a grammar that can be used to call function values inside a SELECT (our current GET) statement, instead of messing with the RPC endpoints. As long as they are marked immutable. In this way, one could create an endpoint just for function call (like the dual table) and we also get a way to add function calls for other endpoints as well.

begriffs commented 8 years ago

Here's an idea for passing arguments to a function as url params. Just like we have filtering operations like eq, lt, like etc we could add a special one called arg used only for rpc routes. So the client could request

GET /rpc/myfunc?foo=arg.42

To add type casting we could say something like

GET /rpc/myfunc?foo=arg:string.42

The issue of calling a (stable) proc using the GET verb came up again in the chat room. @ctlaltdefeat wants to do HTTP caching on the response which POST prevents.

ctlaltdefeat commented 8 years ago

I think that would work just fine. It could get cumbersome if the proc inputs are complex, but that should be up to the proc writer to figure out (and the POST method should co-exist, at least for the time being), since I feel like if the proc is stable, GET seems more idiomatic.

LeKovr commented 8 years ago

GET /rpc/myfunc?foo=arg:string.42

Isn't it too complicated?

Between these two choices

?foo=42 (no "arg." because all of GET args are PROC args)
?foo=arg:string.42 and support for "two stored procedures with the same name but taking arguments of different types."

I should choose the first.

calebmer commented 8 years ago

@begriffs I really like that idea. Although there has to be a better way to do types. I like the following better GET /rpc/myfunc?foo=arg.42&foo=type.string.

However, I'd like to also propose the default addition of eq. and arg. when appropriate. In many scenarios it's easy to guess what the default prefix should be.

This makes request construction much easier. In my PostgREST client I'm currently doing some smart detection to add the eq., this way I can pass an object of key/value pairs alone to do filtering. If this could be done in PostgREST, all the better.

With that addition, to @LeKovr's point, we get both styles. 1 and 2, strict and simple, uniform across all PostgREST filtering.

begriffs commented 8 years ago

@LeKovr if ?foo=42 means to pass an argument called foo with value 42, then I think ?bar=eq.12 is ambiguous. It could either mean to filter the output rows where a column called bar is equal to 12 or to pass an argument called bar with value eq.12.

@calebmer we you like the arg thing and we can add that and defer our decision of the syntax for specifying types.

calebmer commented 8 years ago

That's fair about ambiguity. And yeah, we can defer a decision on type specification as long as it's not a pressing issue.

begriffs commented 8 years ago

Reading @LeKovr's suggestion again I noticing it suggests we consider all query params as function args which of course would remove any ambiguity. This would also remove the current ability to sort and filter the output of a proc. However maybe there are reasons for disabling proc output sorting/filtering. For instance the resultset may not (ever) be indexed so sorting or filtering could be slow for the db. It certainly simplifies the format of the query params to consider them strictly as function args for /rpc/:name type paths.

Any opinions?

diogob commented 8 years ago

@begriffs disabling sorting and filtering of functions seems a bit too arbitrary IMHO. Take for instance a use case of using a procedure to do full text searches (sometimes the @@ operator is not enough). If I want to sort and paginate the results in a certain way (let's say the user wants to change the order dynamically), I'd have to include the order as a function parameter and execute dynamic SQL inside the procedure.

So we have a simpler interface at the cost of more complex functions.

begriffs commented 8 years ago

OK, I'd rather leave the sorting and filtering possibilities there as well. Just wanted to discuss the possibilities. So the arg operator it is then.

ruslantalpa commented 8 years ago

2 cents, i think rpc is not a table, obe can return an array, another can return a string, another can return an obj... So filtering the response works only for a particular response format. Am i missing somth? Do rpc procs have to return arrays? Having said that, i would go with foo=42, and any filtering should be implemented in proc with parameters

diogob commented 8 years ago

@begriffs @ruslantalpa a compromise would be to change the default behaviour of foo=42, but keep the order parameter for ordering. Since I believe filtering is easier to add as a parameter than ordering. Not being able to create a parameter with the name order does not seem like a terrible restriction since it is not a good practice anyway.

badri commented 8 years ago

Agree with @ruslantalpa. An RPC isn't a table. We can eliminate filter params for RPC and have 2 types of calls, GET and POST. Any IMMUTABLE procedure can be a GET as it does not change the state of the system, and POST otherwise. Besides, the semantics also match. Type information can be part of the GET/POST parameters. Regarding @diogob 's comment:

If I want to sort and paginate the results in a certain way (let's say the user wants to change the order dynamically)

This can be controlled via the parameters. Or am I missing something obvious?

diogob commented 8 years ago

@badri you are right, this can be controlled through parameters, but it's not a very good design IMHO. Since you are changing your function interface and adding complexity to your code when there is already a fine mechanism for dealing with this, for the results of any function will be a relation.

calebmer commented 8 years ago

I think treating RPCs which can act like a table as a table has some super important use case. For instance, in the future one might want to build a personalized set of rows for a user. In this case, requests like /rpc/personalized_posts?personId=arg.5&topic=eq.inspiration could be super useful. On Wed, Jan 13, 2016 at 8:58 AM Diogo Biazus notifications@github.com wrote:

@badri https://github.com/badri you are right, this can be controlled through parameters, but it's not a very good design IMHO. Since you are changing your function interface and adding complexity to your code when there is already a fine mechanism for dealing with this, for the results of any function will be a relation.

— Reply to this email directly or view it on GitHub https://github.com/begriffs/postgrest/issues/328#issuecomment-171297942.

omani commented 7 years ago

hi,

my two cents: I don't need an rpc to act as a table. we should make our decision simple here. make GET requests available for immutable procs and otherwise use POST. the DBA's intent should be clear about how he wants to treat data or use an rpc. and also:

?foo=42 (no "arg." because all of GET args are PROC args)

I like this idea of having only proc args.

steve-chavez commented 7 years ago

Aside from detecting volatility for GET procs, I think we can have the rpc acting as a table(filters and order enabled) by detecting if the proc returns a postgres composite type(record type, list of field names).

For procs that returns non-composite type(scalars) we can ignore filters and order query params or maybe just output a sql error if they're used.

As for defining the query param that allows to specify the arguments, I'd prefer to not include it as an operator like in ?id=arg.5 I think it'll be somewhat inconsistent, how about creating a new query param for this case, I though about a args=(gender::bit.1, codename::int.13) that can also be used to resolve ambiguity of the args types.

So in summary the interface I propose would be be like:

# Traditional filters work(composite type)
GET /rpc/personalized_posts?args=(person_id.5,group_id.10)&topic=eq.inspiration&order=content

# Traditional filters do not work(scalar)
GET /rpc/ambiguous_scalar_func?args=(gender::bit.1,codename::int.13)

omani commented 7 years ago

what does topic mean there? isnt an rpc just expecting function arguments. where does topic belong to here?

steve-chavez commented 7 years ago

It's an example for a column name filter, filtering, embedding and other table query params are supported on rpc, see the tests here:

https://github.com/begriffs/postgrest/blob/master/test/Feature/QuerySpec.hs#L516-L535

ruslantalpa commented 7 years ago

@steve-chavez there is no need to detect what proc returns since the filters on the result of the proc are applied to the data it returns. If the user specifies some filters on a rpc that returns non-table like response, he will simply get an error from the db. what you need to know for each proc it the name of its parameters (but don't go into their types)

so that for a query like this /rpc/proc?person_id=5&group_id=10&topic=eq.inspiration you would know that the first two are arguments and the last one is a filter.

this imo it's the most natural interface however it does introduce ambiguity when the rpc param has the same name as the column returned by the rpc, so maybe a little more consideration is needed before working on this. I would also argue this feature is not crucial since it can be implemented at the proxy level (turn a specific get request into a post) so we should delay this until we come up with an interface we like (no time pressure of any kind).

omani commented 7 years ago

I would also argue this feature is not crucial since it can be implemented at the proxy level (turn a specific get request into a post) so we should delay this until we come up with an interface we like (no time pressure of any kind).

I already do this with openresty and it is painful. Not only you have to rewrite headers or set additional headers, it is also somewhat weird to turn a GET into a POST at proxy level, just because your backend can't handle it.

I would like to say that I do not think that this feature is not crucial. I think we really need this. any other attempt to solve this on any other level than the backend itself should be considered as a workaround and is not the way to go.

ruslantalpa commented 7 years ago

@omani since when 5-10 lines of config is "painful"? You are exaggerating. If this feature is that important for you, PR's are always welcomed :)

begriffs commented 7 years ago

Just thought of another simplification. We may not need to detect the volatility of functions in order to selectively allow GET requests. If someone calls a function with GET we can simply run the transaction in read-only mode, via: https://github.com/begriffs/postgrest/blob/d89937a6f300d2ca6bcda15cb84198450537467a/src/PostgREST/App.hs#L91

If the stored proc tries to modify anything I believe it would cause an error.

Supporting RPC with GET feels like the right thing to do. That HTTP verb is exactly right for non-volatile function invocation. It expresses to clients and proxies what they can expect w.r.t. not changing server state. Also it will allow caching headers to work when we eventually add the capability to set response headers.

I feel like asking people to script around this and create reverse-proxy rules for each of their functions is shirking a core http duty of postgrest.

steve-chavez commented 7 years ago

@begriffs I agree, so then we may not need to detect the proc volatility or if it returns a composite type(just let the db err).

@ruslantalpa To avoid the input/output ambiguity I proposed the args=(person_id.5,group_id.10) query param, that clearly differentiates from the traditional filters.

ruslantalpa commented 7 years ago

@begriffs i was not saying it should not be done, i was saying it can be implemented in other ways so it should not take precedence over other potential features (which can not be implemented in other places).

@steve-chavez i saw the proposal i just think it's an ugly interface :), especially when ppl just call the function without filtering its result

HTTP does not forbid having two query parameters with the same name, so even in the case of collision (which should be very rare) our code could still differentiate between two parameters by looking at the supplied value, (if just =value then it's a param, if =op.value then it's a filter). It's a bit more tricky but it's relatively well defined

ruslantalpa commented 7 years ago

if i think about it, my proposal i guess it's the same as p=arg.1 without the arg part so reading your original comment, how is that proposal inconsistent, inconsistent with what?

most of the calls to rpcs will not be doing filtering (i think) so i think the interface i proposed is usually what one would expect (and not have to learn a special format for the args parameter which will get complicated fast and hard to generate on the client side)

/rpc/myfn?param1=val1&param2=val2

steve-chavez commented 7 years ago

The inconsistency was in in the arg. part of id=arg.5, since that syntax is already used for operators id=eq.5 in filters. One other reason of why I proposed the args=(gender::bit.1,codename::int.13) was to also be able to specify the type of the arg and avoid those cases of ambiguity(proc with same name different args types), and perhaps avoid having another gender=type.bit as it was mentioned before.

However it seems that it's doable the way you proposed it, for the type we could specify a gender=1::bit, though it seems that way would involve more effort in the parsers.

Also don't think that most of the calls to rpc's will not be doing filtering, rpc is the only way to do certain queries that are not supported directly by PostgREST(queries with postgis operators, regex matching, group by, etc)

ruslantalpa commented 7 years ago

the original issue was about some error but it diverged into this rpc/get thing which was implemented so closing this. if the issue about incorrect query being generated still persists, a new issue should be opened, this one dragged on for too long :)

PostgREST / postgrest

rpc: specify argument types (GET requests on RPCs) #328