cybertec-postgresql / db_migrator

Other
21 stars 8 forks source link

Tools for migrating other databases to PostgreSQL

db_migrator is a PostgreSQL extension that provides functions for migrating databases from other data sources to PostgreSQL. This requires a foreign data wrapper for the data source you want to migrate.

You also need a plugin for db_migrator that contains the code specific to the targeted data source. Currently, plugins exist for the following data sources:

See the section Plugin API below if you want to develop a plugin (I'd be happy to add it to the list above).

See Setup below for installation instructions, Architecture so that you understand what is going on and Usage for instructions how to best migrate a database.

Showcase

This is a complete example of a simple migration of an Oracle database using the ora_migrator plugin.

A superuser sets the stage:

CREATE EXTENSION oracle_fdw;

CREATE SERVER oracle FOREIGN DATA WRAPPER oracle_fdw
   OPTIONS (dbserver '//dbserver.mydomain.com/ORADB');

GRANT USAGE ON FOREIGN SERVER oracle TO migrator;

CREATE USER MAPPING FOR migrator SERVER oracle
   OPTIONS (user 'orauser', password 'orapwd');

PostgreSQL user migrator has the privilege to create PostgreSQL schemas and Oracle user orauser has the SELECT ANY DICTIONARY privilege.

Now we connect as migrator and perform the migration so that all objects will belong to this user:

CREATE EXTENSION ora_migrator;

SELECT db_migrate(
   plugin => 'ora_migrator',
   server => 'oracle',
   only_schemas => '{TESTSCHEMA1,TESTSCHEMA2}'
);

NOTICE:  Creating staging schemas "fdw_stage" and "pgsql_stage" ...
NOTICE:  Creating foreign metadata views in schema "fdw_stage" ...
NOTICE:  Creating schemas ...
NOTICE:  Creating sequences ...
NOTICE:  Creating foreign tables ...
NOTICE:  Migrating table testschema1.baddata ...
WARNING:  Error loading table data for testschema1.baddata
DETAIL:  invalid byte sequence for encoding "UTF8": 0x00: 
NOTICE:  Migrating table testschema1.log ...
NOTICE:  Migrating table testschema1.tab1 ...
NOTICE:  Migrating table testschema1.tab2 ...
NOTICE:  Migrating table testschema2.tab3 ...
NOTICE:  Creating UNIQUE and PRIMARY KEY constraints ...
WARNING:  Error creating primary key or unique constraint on table testschema1.baddata
DETAIL:  relation "testschema1.baddata" does not exist: 
NOTICE:  Creating FOREIGN KEY constraints ...
NOTICE:  Creating CHECK constraints ...
NOTICE:  Creating indexes ...
NOTICE:  Setting column default values ...
NOTICE:  Dropping staging schemas ...
NOTICE:  Migration completed with 2 errors.
 db_migrate 
------------
          2
(1 row)

Even though the migration of one table failed because of bad data in the Oracle database, the rest of the data were migrated successfully.

Setup

Prerequisites

Foreign Data Wrapper

You need to install the foreign data wrapper for the data source from which you want to migrate. Follow the installation instructions of that software. A list of available foreign data wrappers is available in the PostgreSQL Wiki.

You need to define these objects:

Permissions

You need a database user with

The permissions can be reduced once the migration is complete.

db_migrator plugin

You also need to install the db_migrator plugin for the data source from which you want to migrate. Again, follow the installation instructions provided with the software.

Installation

The extension files must be placed in the extension subdirectory of the PostgreSQL shared files directory, which can be found with

pg_config --sharedir

If the extension building infrastructure PGXS is installed, you can do that simply with

make install

The extension is installed in the database to which you want to migrate the data with the SQL command

CREATE EXTENSION db_migrator;

This statement can be executed by any user with the right to create functions in the public schema (or the schema you specified in the optional SCHEMA clause of CREATE EXTENSION).

Architecture

db_migrator uses two auxiliary schemas, the "FDW staging schema" and the "Postgres staging schema". The names are fdw_stage and pgsql_stage by default, but you can choose different names.

In a first step, db_migrator calls the plugin to populate the FDW stage with foreign tables that provide information about the metadata of the remote data source in a standardized way (see Plugin API for details).

In the second step, the data are copied to tables in the Postgres stage, resulting in a kind of snapshot of the data in the FDW stage. These tables are described in detail at the end of this chapter. During this snapshot, table and column names may be translated using a function provided by the plugin. The plugin also provides a default mapping of remote data types to PostgreSQL data types.

As a next step, the user modifies the data in the Postgres stage to fit the requirements for the migration (different data types, edits to function and view definitions etc.). This is done with updates to the tables in the Postgres stage. Also, most tables have a boolean column migrate that should be set to TRUE for all objects that should be migrated.

The next step is to create schemas in the PostgreSQL database and populate them with foreign tables that point to the objects in the remote data source. These tables are then "materialized", that is, local tables are created, and the data from the foreign tables is inserted into the local tables.

Then the other objects and finally the indexes and constraints can be migrated.

Once migration is complete, the FDW stage and the PostgreSQL stage (and the foreign data wrapper) are not needed any more and can be removed.

Tables in the Postgres staging schema

Only edit the columns that are indicated. For example, it you want to change a schema or table name, it is better to rename the schema or table once you are done with the migration.

schemas

tables

columns

checks (check constraints)

keys (columns of primary and unique keys)

indexes

index_columns

partitions

Delete rows from this table if you don't want a partitioned table in PostgreSQL. Alternatively, you can add rows if you want to migrate a non-partitioned table to a partitioned PostgreSQL table.

subpartitions

Delete rows from this table if you don't want a subpartitioned table in PostgreSQL. Alternatively, you can add rows if you want to migrate a table without subpartitions to a PostgreSQL table with subpartitions.

views

sequences

functions (functions and procedures)

triggers

table_privs (permissions on tables)

These are not migrated by db_migrator, but can be used by the migration script to migrate permissions.

column_privs (permissions on table columns)

These are not migrated by db_migrator, but can be used by the migration script to migrate permissions.

Usage

The database user that performs the migration will be the owner of all migrated schemas and objects. Ownership can be transferred once migration is complete. Permissions on database objects are not migrated (but the plugin may offer information about the permissions on the data source).

There is no special support for translating procedural code (functions, procedures and triggers), you will have to do that yourself.

For very simple cases (no stored procedures or triggers to migrate, all views in standard SQL, no data type adaptions required) you can simply call the db_migrate function to migrate the database schemas you want.

For more complicated migrations, you will compose an SQL script that does the following (or parts thereof):

An errors (except connection problems) that happen during database migration will not terminate processing. Rather, they will be reported as warnings. Additionally, such errors are logged in the table migrate_log in the PostgreSQL staging schema.

Later errors can be consequences of earlier errors: for example, any failure to migrate an Oracle table will also make all views and constraints that depend on that table fail.

After you are done, drop the migration extensions to remove all traces of the migration.

Detailed description of the migration functions

db_migrate_prepare

Parameters:

This function must be called first. It creates the staging schemas. The remote staging schema is populated by the plugin. db_migrate_refresh is called to create a snapshot of the remote stage in the Postgres stage.

db_migrate_refresh

Parameters:

You can call this function to refresh the Postgres stage with a new snapshot of the remote stage. This will work as long as no objects on the remote data source are renamed or deleted (adding tables and columns will work fine). Edits made to the Postgres stage will be preserved.

db_migrate_mkforeign

Parameters:

Call this function once you have edited the Postgres stage to your satisfaction. It will create all schemas that should be migrated and foreign tables for all remote tables you want to migrate.

db_migrate_tables

Parameters:

This function calls materialize_foreign_table to replace all foreign tables created by db_migrate_mkforeign with actual tables. The table data are migrated unless with_data is FALSE.

db_migrate_functions

Parameters:

Call this to migrate functions and procedures. Note that migrate is set to FALSE by default for functions and procedures, so you will have to change that flag if you want to migrate functions.

db_migrate_views

Parameters:

Call this to migrate views.

db_migrate_triggers

Parameters:

Call this to migrate triggers. Note that migrate is set to FALSE by default for triggers, so you will have to change that flag if you want to migrate functions.

db_migrate_indexes

Parameters:

Call this to migrate user-defined indexes for the migrated tables.

This function should run after migrating the functions, so that all functions that are needed by indexes are already there.

db_migrate_constraints

Parameters:

Call this to migrate constraints and column defaults for the migrated tables.

This function has to run after everything else has been migrated, so that all functions that may be needed by column defaults are already there and foreign key constraints can use user-defined indexes to speed-up data validation.

db_migrate_finish

Parameters:

Call this function after you have migrated everything you need. It will drop the staging schemas and all their content.

db_migrate

Parameters:

This function provides "one-click" migration by calling the other functions in the following order:

This provides a simple way to migrate simple databases (no user defined functions and triggers, standard compliant view definitions, no data type modifications necessary).

Note that it will not migrate functions and triggers, since migrate is FALSE by default for these objects.

Low-level migration functions

These functions are called by migration functions detailed above.

They are provided as a low-level alternative and are particularly useful if you want to migrate several relations in parallel to improve processing speed with your own external tools.

materialize_foreign_table

Parameters:

This function replaces a single foreign table created by db_migrate_mkforeign with an actual table. If there are any entries for this table in the partitions tables, the table will be created as a partitioned table. Subpartitions are created if there are corresponding entries in subpartitions. The table data are migrated unless with_data is FALSE.

construct_schemas_statements

Parameters:

Will return a table composed by following columns:

construct_sequences_statements

Parameters:

Will return a table composed by following columns:

construct_foreign_tables_statements

Parameters:

Will return a table composed by following columns:

construct_functions_statements

Parameters:

Will return a table composed by following columns:

construct_views_statements

Parameters:

Will return a table composed by following columns:

construct_triggers_statements

Parameters:

Will return a table composed by following columns:

construct_indexes_statements

Parameters:

Will return a table composed by following columns:

construct_key_constraints_statements

Parameters:

Will return a table composed by following columns:

construct_fkey_constraints_statements

Parameters:

Will return a table composed by following columns:

construct_check_constraints_statements

Parameters:

Will return a table composed by following columns:

construct_defaults_statements

Parameters:

Will return a table composed by following columns:

execute_statements

Parameters:

This function iterates through an array of SQL statements and executes them inside a subtransaction. If one fails, it raises a detailed warning and inserts the failed statement and its context into migrate_log table and all previous successful statements in the subtransaction are rollbacked. Returns false on failure.

Plugin API

A plugin for db_migrator must be a PostgreSQL extension and provide a number of functions:

db_migrator_callback

There are no input parameters. The output parameters are:

These functions can have arbitrary names and are described in the following.

Metadata view creation function

Parameters:

This function is called by db_migrate_prepare after creating the FDW staging schema. It has to create a number of foreign tables (or views on foreign tables) that provide access to the metadata of the remote data source.

If the remote data source does not provide a certain feature (for example, if the data source has no concept of triggers), you can create an empty table instead of the corresponding foreign table.

It is allowed to create additional objects in the FDW staging schema if the plugin provides additional features. Similarly, it is allowed to provide other columns beside the ones required by the API specification.

These foreign tables or views must be created:

table of schemas

schemas (
   schema text NOT NULL
)

table of sequences

sequences (
   schema        text    NOT NULL,
   sequence_name text    NOT NULL,
   min_value     numeric,
   max_value     numeric,
   increment_by  numeric NOT NULL,
   cyclical      boolean NOT NULL,
   cache_size    integer NOT NULL,
   last_value    numeric NOT NULL
)

table of tables

tables (
   schema     text NOT NULL,
   table_name text NOT NULL
)

table of columns of tables and views

columns (
   schema        text    NOT NULL,
   table_name    text    NOT NULL,
   column_name   text    NOT NULL,
   position      integer NOT NULL,
   type_name     text    NOT NULL,
   length        integer NOT NULL,
   precision     integer,
   scale         integer,
   nullable      boolean NOT NULL,
   default_value text
)

Note that this table has to contain columns for both the tables and the views table.

table of check constraints

checks (
   schema          text    NOT NULL,
   table_name      text    NOT NULL,
   constraint_name text    NOT NULL,
   deferrable      boolean NOT NULL,
   deferred        boolean NOT NULL,
   condition       text    NOT NULL
)

table of primary key and unique constraint columns

keys (
   schema          text    NOT NULL,
   table_name      text    NOT NULL,
   constraint_name text    NOT NULL,
   deferrable      boolean NOT NULL,
   deferred        boolean NOT NULL,
   column_name     text    NOT NULL,
   position        integer NOT NULL,
   is_primary      boolean NOT NULL
)

For a multi-column constraint, the table will have one row per column.

table of foreign key constraint columns

foreign_keys (
   schema          text    NOT NULL,
   table_name      text    NOT NULL,
   constraint_name text    NOT NULL,
   deferrable      boolean NOT NULL,
   deferred        boolean NOT NULL,
   delete_rule     text    NOT NULL,
   column_name     text    NOT NULL,
   position        integer NOT NULL,
   remote_schema   text    NOT NULL,
   remote_table    text    NOT NULL,
   remote_column   text    NOT NULL
)

For a multi-column constraint, the table will have one row per column.

table of partitions

partitions (
    schema         name    NOT NULL,
    table_name     name    NOT NULL,
    partition_name name    NOT NULL,
    type           text    NOT NULL,
    key            text    NOT NULL,
    is_default     boolean NOT NULL,
    values         text[]
)

table of subpartitions

subpartitions (
    schema            name    NOT NULL,
    table_name        name    NOT NULL,
    partition_name    name    NOT NULL,
    subpartition_name name    NOT NULL,
    type              text    NOT NULL,
    key               text    NOT NULL,
    is_default        boolean NOT NULL,
    values            text[]
)

For explanations, see partitions above.

table of views

views (
   schema     text NOT NULL,
   view_name  text NOT NULL,
   definition text NOT NULL
)

The columns of the view are defines in the columns table.

table of functions and procedures

functions (
   schema        text    NOT NULL,
   function_name text    NOT NULL,
   is_procedure  boolean NOT NULL,
   source        text    NOT NULL
)

table of indexes

indexes (
   schema        text    NOT NULL,
   table_name    text    NOT NULL,
   index_name    text    NOT NULL,
   uniqueness    boolean NOT NULL,
   where_clause  text
)

table of index columns

index_columns (
   schema        text    NOT NULL,
   table_name    text    NOT NULL,
   index_name    text    NOT NULL,
   position      integer NOT NULL,
   descend       boolean NOT NULL,
   is_expression boolean NOT NULL,
   column_name   text    NOT NULL
)

table of triggers

triggers (
   schema            text    NOT NULL,
   table_name        text    NOT NULL,
   trigger_name      text    NOT NULL,
   trigger_type      text    NOT NULL,
   triggering_event  text    NOT NULL,
   for_each_row      boolean NOT NULL,
   when_clause       text,
   trigger_body      text    NOT NULL
)

table of table privileges

table_privs (
   schema     text    NOT NULL,
   table_name text    NOT NULL,
   privilege  text    NOT NULL,
   grantor    text    NOT NULL,
   grantee    text    NOT NULL,
   grantable  boolean NOT NULL
)

table of column privileges

column_privs (
   schema      text    NOT NULL,
   table_name  text    NOT NULL,
   column_name text    NOT NULL,
   privilege   text    NOT NULL,
   grantor     text    NOT NULL,
   grantee     text    NOT NULL,
   grantable   boolean NOT NULL
)

Data type translation function

Parameters:

Result type: text

This function translates data types from the remote data source to PostgreSQL data types. The result should include the type modifiers if applicable, for example character varying(20).

Identifier translation function

Parameters:

Result type: name

This function should generate a PostgreSQL object or column name. If no translation is required, the function should just return its argument, which will automatically be truncated to 63 bytes.

Expression translation function

Parameters:

Result type: text

This function should make a best effort in automatically translating expressions between the SQL dialects. Anything that this function cannot translate will have to be translated by hand during the migration.

Foreign table creation function

Parameters:

Result type: text

This function generates a CREATE FOREIGN TABLE statement that creates a foreign table with these definitions. This is required because the syntax varies between foreign data wrappers.

Contributing

Contributions are always welcome!

See CONTRIBUTING.md for ways to get started.

Support

Create an issue on Github or contact CYBERTEC PostgreSQL International GmbH.