citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.65k stars 670 forks source link

In a transaction block, creating a distributed table whose name exceeds 64 characters may create a distributed deadlock #1664

Open ozgune opened 7 years ago

ozgune commented 7 years ago

The following code block works as expected in Citus 6.2-4:

BEGIN;
CREATE TABLE tmp_a191e16d9d53aeca7616674831a561af_client_session_opened (
    foo text NOT NULL
);
SELECT create_distributed_table('tmp_a191e16d9d53aeca7616674831a561af_client_session_opened', 'foo');

In Citus 7.0-15, the same code block results in a transaction that hangs. The distributed deadlock detection code ultimately aborts the transaction. I see the following error on the coordinator node:

ERROR:  
CONTEXT:  while executing command on citus_worker_1:5432

PostgreSQL truncates table names that exceed 64 characters. Citus does the same for shard names. However, we didn't change any logic in Citus 7 related to name truncation (or adding a hash into the shard name).

The issue here could be related to how Citus manages transactions and connections. The error message above is a symptom and we should look to identify its root cause.

marcocitus commented 7 years ago

During table creation, postgres also creates 2 types for the relation, a regular type (e.g. events) and an array type for arrays (events[]). The actual name of the array type in pg_type is _events, in other words: the relation name prefixed with an underscore. When adding an underscore to a name that's already 64 characters, the last character of the name is removed.

This means creating tables with 64 character names could lead to type name collisions in the array type when creating 2 tables which have the first 63 characters in common. The way postgres solves the collision is by adding additional underscores to the second type name, and truncating it further. Before it can generate the new array type name, it acquires a lock on the existing array type in pg_type, and if that hasn't been committed yet, CREATE TABLE will block on this lock.

When running create_distributed_table in a transaction block, we use a separate connection per shard placement in order to allow a COPY or DDL command in the same transaction. However, with 64 character table names, this creates a deadlock between the different connections that are trying to create shard placements which have the first 63 characters in common (unless there are 10 or more worker nodes). This means the second CREATE TABLE causes an array type name collision and blocks, which causes the deadlock detector to detect a self-deadlock (not sure why we sometimes get an empty error messages).

For long table names we use a shortening mechanism that truncates the table name and adds a hash of the full name before appending the shard ID. That way, we get shard names that are exactly 64 characters even if appending a shard ID would cause it to exceed 64 characters. I think we can lower the threshold such that we generate shard names that are at most 63 characters.

jasonmp85 commented 7 years ago

@marcocitus: your explanation here doesn't address what changed in 7.0… What's the transactional/connection-model change that causes this?

Trying to reason through possible approaches to this; it's a tough one.

b-l-a-c-k-b-e-a-r-d commented 6 years ago

+1

Caused a few hours of troubleshooting.. I also noticed that the WAL filled my disk when running \copies with table names >63 characters (after Citus appended shard_ids).

onderkalaci commented 4 years ago

Partitioned tables tends to have longer names, and this becomes and issue there as well:

-- to force more than 1 connection per node
set citus.force_max_query_parallelization TO ON;

CREATE TABLE test_index_creation1
(   
    tenant_id integer NOT NULL,
    timeperiod timestamp without time zone NOT NULL,
    field1 integer NOT NULL,
    inserted_utc timestamp without time zone NOT NULL DEFAULT now()
) PARTITION BY RANGE (timeperiod);

select create_distributed_table('test_index_creation1', 'tenant_id');

CREATE TABLE test_index_creation1_p2020_09_28_long_name_long_name_long PARTITION OF test_index_creation1 FOR VALUES FROM ('2020-09-28 00:00:00') TO ('2020-09-29 00:00:00');
ERROR:  canceling the transaction since it was involved in a distributed deadlock
hanefi commented 3 years ago

4709 creates the underlying structure needed for sequential execution to resolve deadlocks created by long table names in rename queries.

If we build a similar solution for CREATE TABLE .. PARTITION OF and ATTACH PARTITION commands, we can fix this issue as well.