citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.57k stars 670 forks source link

Force push down of function calls in a distributed transaction #3265

Closed marcocitus closed 2 years ago

marcocitus commented 4 years ago

It would be useful if create_distributed_function would allow you to force pushdown of stored procedure calls even if the procedure is called from a bigger stored procedure/transaction, with the restriction that the inner procedure can only do local execution and will error out if it opens a connection.

This would also be useful infrastructure for implementing co-located triggers.

tejeswarm commented 3 years ago

Loosely defining, the scope of the work is

1) Function can still be delegated even when present in scope an explicit BEGIN/COMMIT transaction block. 2) Prohibit/Limit the delegated function not to do a 2PC (or any work on a remote node than the one it currently landed on) 3) Have a safety net around ensuring the (2) i.e. we should block the connections from the delegated procedure or make sure that no 2PC happens on the node.

To limit the scope of the project we are considering only Functions(not procedures) for the initial work.

marcocitus commented 3 years ago

In addition, it would be good to limit queries to the same distribution column value. Otherwise, applications might break as they move from single node to multi node, since everything would be allowed on single node, but not on multi node.

tejeswarm commented 3 years ago

Sure @marcocitus , I think we might need some to do something at the compile/planning level to prevent such cases, the biggest challenge with Postgres PLpgSQL is, it compiles the functions but not the individual embedded SQL in it.

I haven't looked at your draft PR yet, but I was trying to write some primitive testcases to see if my understanding of the requirements is correct.

https://github.com/citusdata/citus/pull/5390/commits/e569c1834dc6850f89189de722936dc71e5c4916

Please eyeball it, especially the testcases, though our goal is to support triggers, I believe we should ensure the transaction semantics are not compromised in the process. This is just a hack to see the viability and the approach to take, we can discuss more in our tomorrow's meeting.

tejeswarm commented 3 years ago

Marco: In citus_custom_scan.c there's some usage of workerJob->partitionKeyValue, which is set to a Const for single shard queries. it might be a good place to check whether it's a) not null, b) equal to the distribution column argument of the procedure call In principle, the worker could be smart enough to recognize that it is receiving a delegated procedure call (e.g. in TryToDelegateFunctionCall) and pick up the value from the distribution argument of the function call. This does assume the worker knows about distributed functions (has it in pg_dist_object).

We currently do not sync pg_dist_object metadata, though plan is to do that for Citus 11. https://github.com/citusdata/citus/pull/5132 We may also need an extra column in pg_dist_object for "forced pushdown", although changing pg_dist_object is a little annoying because it affects upgrade scripts.

An alternative might be to have something like a magic anyelement pushdown(function_call anyelement) function. That let's you write PERFORM pushdown(mongo_api_internal.insert_one(...)));.

The planner would have to recognize that function (probably in TryToDelegateFunctionCall), and force pushdown of the inner procedure call.