citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.53k stars 667 forks source link

Router executor acquires a lock on '0' instead of shardid #342

Closed metdos closed 8 years ago

metdos commented 8 years ago

RouterExecutorRun() calls AcquireExecutorShardLock().

static void
AcquireExecutorShardLock(Task *task, LOCKMODE lockMode)
{
    int64 shardId = task->shardId;

    LockShardResource(shardId, lockMode);
}

But shardId field of the Task structure is only defined for shard fetch tasks. Therefore, for this context, it becomes 0.

typedef struct Task
{
...
uint64 shardId;               /* only applies to shard fetch tasks */
...
}

For testing purposes, you can run copy_to_distributed_table and shard rebalancer in parallel and see that that they do not block each other.

Again for testing purposes, I set shardid to task->anchorShardId (not sure if this is a proper way to fix this issue, but it was okay for my testing purposes) and saw that copy_to_distributed_table and shard rebalancer blocked each other as expected.

metdos commented 8 years ago

Related to the issue #333.