citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.34k stars 655 forks source link

Results of re-partition queries are affected by concurrent INSERT/DELETE/UPDATEs #723

Open metdos opened 8 years ago

metdos commented 8 years ago

While a re-partition query runs, especially a long one, we still continue to run INSERT/UPDATE/DELETE queries in parallel. Therefore, some tasks see the shards before the INSERT/UPDATE/DELETE queries, and other tasks see the shards after INSERT/UPDATE/DELETE queries.

How to replicate;

Load data:

CREATE TABLE lineitem
(
    l_orderkey    BIGINT not null,
    l_partkey     INTEGER,
    l_suppkey     INTEGER,
    l_linenumber  INTEGER,
    l_quantity    DOUBLE PRECISION,
    l_extendedprice  DOUBLE PRECISION,
    l_discount    DOUBLE PRECISION,
    l_tax         DOUBLE PRECISION,
    l_returnflag  CHAR(1),
    l_linestatus  CHAR(1),
    l_shipdate    DATE,
    l_commitdate  DATE,
    l_receiptdate DATE,
    l_shipinstruct CHAR(25),
    l_shipmode     CHAR(10),
    l_comment      VARCHAR(44)
);

SELECT master_create_distributed_table('lineitem', 'l_orderkey', 'hash');
SELECT master_create_worker_shards('lineitem', 512, 1);

COPY lineitem from '/home/ubuntu/tpch_2_13_0/lineitem.tbl' WITH DELIMITER '|';

Set task tracker:

set citus.task_executor_type to 'task-tracker';

Run a long running query

SELECT
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
FROM
    lineitem
WHERE
    l_shipdate <= date '1998-12-01' - interval '90' day
GROUP BY
    l_returnflag,
    l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus;

Run the same query above again, but this time after starting the query, start to delete some rows concurrently.

for i in `seq 1 1000`;
do
    /usr/local/pgsql/bin/psql -d postgres -c  "DELETE FROM lineitem WHERE l_orderkey = $i;"
done

Finally, observe that the result of the query above changed by the concurrent delete commands which are run after the original select query starts.

marcocitus commented 8 years ago

In principle, there's no guarantee that the SELECT will obtain a snapshot before any of the DELETEs if they are concurrent, even if the command was sent earlier. You could see the same behaviour on PostgreSQL, even though it's unlikely.

The main issue is that a real-time or task-tracker SELECT might see the DELETEs in a different order because we don't have a global transaction manager.