citusdata / pg_shard

ATTENTION: pg_shard is superseded by Citus, its more powerful replacement
https://github.com/citusdata/citus
GNU Lesser General Public License v3.0
1.06k stars 63 forks source link

CitusDB metadata interoperability #103

Closed jasonmp85 closed 9 years ago

jasonmp85 commented 9 years ago

OK, so this has been entirely rebased on top of the changes from #110, i.e. we're using SPI everywhere now. This review is now all about the triggers and views needed to adapt CitusDB's metadata tables to be used within pg_shard.

Travis is building this under PostgreSQL 9.3/9.4, and CitusDB 4.0, so there is some assurance this actually works.

Code Review Tasks

Resolves #11. Resolves #27.

marcocitus commented 9 years ago

I'm aware that this work in progress, but some early feedback. We were trying this out today to see if it would help make the CitusDB -> pg_shard migration path easier for CitusDB documentation purposes, since that seems to be the more common direction. We created a table using CitusDB with DISTRIBUTE BY APPEND and used this branch of pg_shard.

One problem we ran into is that when doing an INSERT in an non-existing range we got "ERROR: cannot execute INSERT on a distributed table on master node", because pg_shard ExecutorStart executor hook falls back to CitusDB for zero-shard queries. This led to some confusion.

The next problem we ran into is that pg_shard caches the metadata. Any change made to the metadata by CitusDB is not visible to pg_shard until a new session is started, this also led to some confusion. For example, performing \STAGE and then INSERT into the range would give "ERROR: no placements exist for shard with ID".

Finally, we added a new shard using \STAGE, the range of which overlapped with an existing the shard. We then tried an INSERT on the overlapping range. It went to the first shard. We then started a new session to clear the cache, after which we got "ERROR: cannot modify multiple shards during a single query". This was expected, but I guess an INSERT should always go to at most one shard.

It seems the more fundamental issue here is the caching. We might have to make changes to CitusDB to help pg_shard clear its cache or set up a trigger on the catalog table.

jasonmp85 commented 9 years ago

Thanks @marcocitus… there was no intention of pg_shard honoring DISTRIBUTE BY APPEND at the moment anyways, as it should only understand hash partitioning. I'll work to ensure it understands there are tables that are distributed but which it should not touch.

As far as caching goes… bleh. I forgot we had added caching. pg_shard doesn't expect things to change underneath it. It doesn't cache empty results, so if no shards exist you shouldn't see caching. But once shards exist we currently expect them to be long-lived (placements, on the other hand, can change state, etc.)

Do we anticipate having \STAGE work with pg_shard anytime soon? pg_shard assumes partitions are that: partitions.

ozgune commented 9 years ago

I had a quick clarification question.

With these changes, which migration scenario do we intend to handle? (a) pg_shard -> CitusDB, (b) CitusDB -> pg_shard (shard rebalancer?), or (c) CitusDB <-> pg_shard

CitusDB currently observes state changes when we \stage to a new shard (append partitioned), append to an existing shard, or rebalance a table. pg_shard observes state changes when we create shards for a new table, or fail to write to a shard placement (hash partitioned).

Two other questions -- don't know, just asking. Who has the authoritative metadata, CitusDB or pg_shard with these changes? Under what conditions does the authority change?

jasonmp85 commented 9 years ago

Because this PR actually causes pg_shard to write through to CitusDB when CitusDB is present, there is only ever one actual metadata store. When using CitusDB, CitusDB's pg_dist* tables store metadata. When using pg_shard with PostgreSQL, it uses its own tables.

That means the only possibility is stale metadata (i.e. from caching). So we'll want to address that.

jasonmp85 commented 9 years ago

@ozgune what do you mean "migration scenario"? I can see:

Am I missing any other possibilities?

ozgune commented 9 years ago

@jasonmp85 Yup, I was curious more about the use-cases, where we needed to sync metadata between pg_shard and CitusDB.

  1. Using pg_shard with PostgreSQL and upgrading your underlying DB to CitusDB (pg_shard metadata tables -> CitusDB metadata tables)
  2. Using pg_shard to create new distributed tables within CitusDB and have it know how to query them (pg_shard -> CitusDB)
  3. Using CitusDB without pg_shard and wanting to add it for INSERTs: In this case, is the CitusDB table partitioned by append? Could we overlook this scenario by guiding CitusDB users to start with pg_shard?
  4. Using pg_shard for Inserts, CitusDB for Selects, and shard rebalancer to rebalance the shards (pg_shard -> CitusDB, then CitusDB -> pg_shard): Is adding code to the shard rebalancer an option to sync the CitusDB metadata back to pg_shard?

I'm wonder if we can simplify the problem down to (pg_shard -> CitusDB). If we could and wanted to, could we then put a trigger on the pg_shard metadata to propagate the update to CitusDB system catalogs?

jasonmp85 commented 9 years ago

Let's stop using the word sync. There is no syncing here.

jasonmp85 commented 9 years ago

To clarify, everything is always in sync… there is no sync step. pg_shard will directly write to CitusDB metadata tables when installed within CitusDB, and the right things are in place to permit a migration from PostgreSQL + pg_shard to CitusDB using pg_dump to write out a pg_shard master and turn it into a CitusDB master.

jasonmp85 commented 9 years ago

By the way, the migration step is to write out all normal tables except the pg_shard metadata using pg_dump -N pgs_distribution_metadata. You apply that to a fresh CitusDB install that will be your master .Then you dump the pg_shard metadata with pg_dump --inserts -t 'pgs_distribution_metadata.*' and apply that to the new CitusDB master. From that point onwards all previous pg_shard tables will be seen by CitusDB, and all pg_shard and CitusDB metadata will be identical.

jasonmp85 commented 9 years ago

If you're using pg_shard within CitusDB from square one, it always remains in sync: it sees all CitusDB tables, and CitusDB sees all pg_shard metadata.

jasonmp85 commented 9 years ago

I'm still figuring out how I want to test the CitusDB-specific parts of this, but the main code is complete.

jasonmp85 commented 9 years ago

Code rebased on latest develop, tests fixed, and changes from feedback made (denoted by (done) in one of my replies).

@sumedhpathak, can you see what else needs attention on this PR?