citusdata / pg_shard

ATTENTION: pg_shard is superseded by Citus, its more powerful replacement
https://github.com/citusdata/citus
GNU Lesser General Public License v3.0
1.06k stars 63 forks source link

Lock metadata during cluster structural changes #123

Closed jasonmp85 closed 9 years ago

jasonmp85 commented 9 years ago

The guiding principle here is that readers must know about a relevant set of objects to perform distributed queries so need guarantees that the relevant set will not change beneath them during said queries.

In practice, this means we need two locks: one for the set of shards related to a relation and one for the set of placements related to a shard. If the caller intends to add shards (or (active) placements), they must hold an exclusive lock on the grouping relation (or shard). If the caller just wants to act on the current set of shards (or the (active) placements), they need a shared lock before reading.

I explored using row locks to accomplish this but they'd be just as arbitrary as some advisory lock with the added downside that they can cause disk writes. LockDatabaseObject promised to provide a more structured way to specify what we're locking, but it is usually called with a relid-objid pair, each half of which is 32-bits. This provides two problems:

Ultimately I decided just using an AccessExclusive lock on the relation itself was sufficient to signal we're reading or changing that relation's shard set (use of an AccessExclusive lock here is analogous to PostgreSQL's own use of that level for most table schema changes). The lock used to read or change placements of a shard is an advisory lock comprised of the database identifier, the shard identifier, and a lock type (metadata or actual data).

anarazel commented 9 years ago

I'm only starting to look around this - I missing a fair bit of context.

One thing I noticed is that I don't think the current values for SET_LOCKTAG_ADVISORY are chosen perfectly. Right now, unless I miss something, you conflict with advisory locks taken by applications. There's a bunch of users of advisory locks, some of them not entirely unrealistically used together with pg_shard.

If you look at how that locktag is used to implement advisory locks:

/*
 * Functions for manipulating advisory locks
 *
 * We make use of the locktag fields as follows:
 *
 *  field1: MyDatabaseId ... ensures locks are local to each database
 *  field2: first of 2 int4 keys, or high-order half of an int8 key
 *  field3: second of 2 int4 keys, or low-order half of an int8 key
 *  field4: 1 if using an int8 key, 2 if using 2 int4 keys
 */
#define SET_LOCKTAG_INT64(tag, key64) \
    SET_LOCKTAG_ADVISORY(tag, \
                         MyDatabaseId, \
                         (uint32) ((key64) >> 32), \
                         (uint32) (key64), \
                         1)
#define SET_LOCKTAG_INT32(tag, key1, key2) \
    SET_LOCKTAG_ADVISORY(tag, MyDatabaseId, key1, key2, 2)

If we force the fourth key to not be a 1 or 2 we'll be safe against conflicts with sql level advisory locks.

jasonmp85 commented 9 years ago

I didn't know that about the SQL-level locks and was honestly kind of consigned that we'd just conflict (resulting in deadlocks or whatnot). But you're right: by using e.g. 3 and 4 we'd avoid the problem. I'll switch to that.

anarazel commented 9 years ago

So, a bit more general question: In your email you'd referenced a number of cases, like concurrent plans, that should be affected when a placement states are changed. I'm not sure I can agree this is a good idea in all cases: Now appending a shard will conflict with concurrent queries if I see correctly? Isn't that something we'd like to avoid? ISTM that it'd be perfectly fine to use the definition (i.e. shard states) valid before the start of the planning.

WRT to marking a placement as inactive: In your email (or the referenced gist) you'd mentioned that that should require a lock beside the row level one - that's not the case right now? I think that's ok, because otherwise we'll just delay marking it as inactive, but it runs counter what you'd argued for.

jasonmp85 commented 9 years ago

Addressed all feedback.