Open saicitus opened 3 years ago
Workaround Use a CTE to materialize the results of the volatile function on the coordinator.
@saicitus you are probably aware, but wanted to mention for the readers. Normally volatile functions would generate distinct values per updated row. If materialized, it generates a single value for all values, so be careful of this.
select count(*) FROM loc ;
count
-------
101
(1 row)
Time: 1.490 ms
UPDATE loc SET b = random();
UPDATE 101
Time: 1.773 ms
SELECT count(DISTINCT b) FROM loc;
count
-------
101
(1 row)
Time: 1.344 ms
With materialization, we break this:
WITH c1 AS (SELECT random() as r) UPDATE loc SET b = c1.r FROM c1;
UPDATE 101
Time: 1.149 ms
SELECT count(DISTINCT b) FROM loc;
count
-------
1
(1 row)
While enabling volatile functions is tricky for reference tables because we somehow need to generate the same values on every replica, it should be straight-forward for regular distributed tables. The main challenge is that our current function evaluation logic is unable to distinguish between stable functions and volatile functions, and therefore would incorrectly evaluate volatile functions on the coordinator.
I also run into this problem when I use crypt() function which is from pgcrypto extension. However, I just found a workaround to use VOLATILE functions on distributed tables.
The following SQL statement will result in an error:
UPDATE t001 SET pwd = public.crypt('new password', public.gen_salt('md5')) where tid=1;
OUT: ERROR: functions used in UPDATE queries on distributed tables must not be VOLATILE
However, with a select subquery, the update statement runs well with the expected result.
UPDATE t001 SET pwd = (select public.crypt('new password', public.gen_salt('md5'))) where tid=1;
OUT: UPDATE 1
Will my workaround cause some side effects? Any drawbacks will it bring?
UPDATE: I just understand the drawbacks. :(
Quoting @marcocitus from an internal chat for some tricky parts of function evaluation on Citus (mostly due to distributed nature of things):
just as a reminder, there are 2 reasons why we don't support volatile functions in update/delete
we need to ensure identical outcomes for replicated (reference) tables
we evaluate stable functions in the query tree on the coordinator, but do not have a way to skip volatile functions
for SELECT, we (incorrectly) skip function evaluation even for stable functions, because we cannot afford to evaluate the volatile functions, and we definitely cannot afford to disallow them in SELECT
for MERGE, I think we don't do function evaluation either, which means we would incorrectly skip stable functions, except we error out before then
this gets a bit messy when it comes to UPDATE+CTEs, since the CTEs can have stable functions that are incorrectly pushed down
but for the UPDATE itself they are evaluated on the coordinator
MERGE+source query is probably a similar case
when we solve volatile-functions-in-update problem, we should also start doing stable function evaluation for SELECT
Citus version: 10.0 Customer reported issue.
Steps to repro: CREATE TABLE reference_key (id int, s text); SELECT create_reference_table('reference_key'); citus=> UPDATE reference_key SET s=PGP_SYM_ENCRYPT('123', 'sai_test') where id=1; ERROR: functions used in UPDATE queries on distributed tables must not be VOLATILE
CREATE TABLE distributed_key (id int, s text); select create_distributed_table('distributed_key','id'); citus=> UPDATE distributed_key SET s=PGP_SYM_ENCRYPT('123', 'sai_test'); ERROR: functions used in UPDATE queries on distributed tables must not be VOLATILE
citus=> UPDATE distributed_key SET s=PGP_SYM_ENCRYPT('123', 'sai_test') where id=1; ERROR: functions used in UPDATE queries on distributed tables must not be VOLATILE
Workaround Use a CTE to materialize the results of the volatile function on the coordinator.