citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.49k stars 665 forks source link

Simplify data migration from PostgreSQL to Citus #37

Open ozgune opened 8 years ago

ozgune commented 8 years ago

Simplify data migration from PostgreSQL (local tables) to Citus (distributed tables).

This issue has several components to it and each one would be beneficial in isolation:

  1. Migrate data from an existing PostgreSQL database to the Citus coordinator. AWS has a data migration service for Postgres that could be worth looking into.
    • Do we provide this for Citus Cloud (managed), AWS, or on-prem deployments?
    • Do we take any downtime when replicating the data?
    • Do we follow an approach that uses logical replication (Slony, pg_logical) or physical replication?
  2. Load data in Citus coordinator into distributed tables
    • One way to do that is by running an INSERT INTO ... SELECT. #782 and #1117 provides a good workaround for this step.
    • Do we take any downtime when replicating the data?
  3. Enable schema migrations for the multi-tenant data model. During migrations, Citus may require changes to the underlying data definition. For example:
    • You may need to add a tenant_id column to your tables and then backfill data. This particular item comes up frequently in engineering sessions.
    • You may then need to change your primary key or foreign key declarations.
  4. Enable schema migrations from "one schema per tenant" databases to "shared tables." The Apartment gem & corresponding blog post talks about the "one schema per tenant" approach. We could look to easily migrate prospective users to Citus' multi-tenant model.
  5. Enable schema migrations from other relational databases to PostgreSQL. AWS has a schema migration tool that may be worth looking into.
  6. Automate data remodeling for the multi-tenant use case. In this migration task, we'd write software to automate the following: understand the current table schema (likely in the relational database model), pick the table that's at the top of the hierarchy, and convert the relational database model into the hierarchical one while also adding the tenant_id column to the corresponding tables.
sumedhpathak commented 8 years ago

Noted by @samay-sharma as a requested customer feature.

ozgune commented 7 years ago

We're adding the 6.1 Release milestone to investigate this issue and better define requirements.