Simplify data migration from PostgreSQL (local tables) to Citus (distributed tables).
This issue has several components to it and each one would be beneficial in isolation:
Migrate data from an existing PostgreSQL database to the Citus coordinator. AWS has a data migration service for Postgres that could be worth looking into.
Do we provide this for Citus Cloud (managed), AWS, or on-prem deployments?
Do we take any downtime when replicating the data?
Do we follow an approach that uses logical replication (Slony, pg_logical) or physical replication?
Load data in Citus coordinator into distributed tables
One way to do that is by running an INSERT INTO ... SELECT. #782 and #1117 provides a good workaround for this step.
Do we take any downtime when replicating the data?
Enable schema migrations for the multi-tenant data model. During migrations, Citus may require changes to the underlying data definition. For example:
You may need to add a tenant_id column to your tables and then backfill data. This particular item comes up frequently in engineering sessions.
You may then need to change your primary key or foreign key declarations.
Enable schema migrations from "one schema per tenant" databases to "shared tables." The Apartment gem & corresponding blog post talks about the "one schema per tenant" approach. We could look to easily migrate prospective users to Citus' multi-tenant model.
Enable schema migrations from other relational databases to PostgreSQL. AWS has a schema migration tool that may be worth looking into.
Automate data remodeling for the multi-tenant use case. In this migration task, we'd write software to automate the following: understand the current table schema (likely in the relational database model), pick the table that's at the top of the hierarchy, and convert the relational database model into the hierarchical one while also adding the tenant_id column to the corresponding tables.
Simplify data migration from PostgreSQL (local tables) to Citus (distributed tables).
This issue has several components to it and each one would be beneficial in isolation:
INSERT INTO ... SELECT
. #782 and #1117 provides a good workaround for this step.tenant_id
column to your tables and then backfill data. This particular item comes up frequently in engineering sessions.tenant_id
column to the corresponding tables.