CartoDB / crankshaft

CARTO Spatial Analysis extension for PostgreSQL
BSD 3-Clause "New" or "Revised" License
54 stars 20 forks source link

Add parallel and volatile categories to PG functions #181

Closed Algunenano closed 6 years ago

Algunenano commented 6 years ago

As https://github.com/CartoDB/cartodb-postgresql/issues/300, with pg9.6+ there is the option to mark functions as PARALLEL (different levels) so the analyzer can decide to run the query in several processes simultaneously. In a similar fashion, marking the VOLATILE category enables optimizations to avoid re-running queries (or some parts of a query).

Should maintain compatibility with previous PostgreSQL releases (at least for now).

rafatower commented 6 years ago

Referencing https://github.com/CartoDB/cartodb-postgresql/issues/300 to give a bit more of context.

cc'ing @andy-esch and @jgoizueta for awareness

andy-esch commented 6 years ago

Thanks @rafatower. From Paul's comment in CartoDB/cartodb-postgresql#300:

Almost everything can be marked parallel. If it doesn't read from (or write to) a table, it can be marked.

Some of the PL/Python functions internally use plpy.execute('select ....') to gather data from tables. For example, CDB_AreasOfInterestLocal relies on this here. Based on that comment, are they still eligible to be marked parallel?

Algunenano commented 6 years ago

Technically speaking, reading shouldn't be an issue as long as you are sure you are only reading. For example, in the PR (#183) I've marked for now a lot of python functions as unsafe because they execute an external subquery that we don't know anything about. For example, you could pass CDB_AreasOfInterestGlobal something like this as the first parameter:

WITH t AS (
    UPDATE products SET price = price * 1.05
    RETURNING *
)
SELECT * FROM t;

This is not parallelizable (and not STABLE either). One possible solution is to also provide a signature that receives directly the data instead of the query; another one is to blindly trust the user and hope it sends a PARALLEL SAFE subquery.

Algunenano commented 6 years ago

Deploying 0.6.1 to test in Staging. There was a couple of issues with the servers (https://github.com/CartoDB/cartodb-platform/issues/3737) but I've updated a single user and it seems to be working fine:

cartodb_staging_user_9824ba31-2cbc-4be5-b723-9bde90d5b883_db=# \df+ cdb_crankshaft.cdb_crankshaft_version
                                                                                               List of functions
     Schema     |          Name          | Result data type | Argument data types |  Type  | Volatility | Parallel |  Owner   | Security | Access privileges | Language |       Source code       | Description 
----------------+------------------------+------------------+---------------------+--------+------------+----------+----------+----------+-------------------+----------+-------------------------+-------------
 cdb_crankshaft | cdb_crankshaft_version | text             |                     | normal | immutable  | safe     | postgres | invoker  |                   | sql      |                        +| 
                |                        |                  |                     |        |            |          |          |          |                   |          |   SELECT '0.6.1'::text;+| 
                |                        |                  |                     |        |            |          |          |          |                   |          |                         | 
(1 row)

cartodb_staging_user_9824ba31-2cbc-4be5-b723-9bde90d5b883_db=# Select cdb_crankshaft.cdb_crankshaft_version();
 cdb_crankshaft_version 
------------------------
 0.6.1
(1 row

Once the issue is resolved I'll update all users in staging and, if no issues appear, deploy to production.

cc/ @rafatower