GreenmaskIO / greenmask

PostgreSQL database anonymization and synthetic data generation tool
https://greenmask.io
Apache License 2.0
1.14k stars 21 forks source link
anonymization deterministic dump golang masking obfuscation obfuscator postgresql restore s3 security security-tools staging synthetic-data transform

Greenmask

Dump anonymization and synthetic data generation tool

Greenmask is a powerful open-source utility that is designed for logical database backup dumping, anonymization, synthetic data generation and restoration. It has ported PostgreSQL libraries, making it reliable. It is stateless and does not require any changes to your database schema. It is designed to be highly customizable and backward-compatible with existing PostgreSQL utilities, fast and reliable.

Discord Telegram X (formerly Twitter) Follow

Build status Documentation License GitHub Release GitHub Downloads (all assets, all releases) Docker pulls Go Report Card Quality Gate Status

schema.png

Getting started

Greenmask has a Playground - it is a sandbox environment in Docker with sample databases included to help you try Greenmask without any additional actions

  1. Clone the greenmask repository and navigate to its directory by running the following commands:

    git clone git@github.com:GreenmaskIO/greenmask.git && cd greenmask
  2. Once you have cloned the repository, start the environment by running Docker Compose:

    docker-compose run greenmask

Features

Use Cases

Greenmask is ideal for various scenarios, including:

General Information

The best approach for logical backup dumping and restoration is to use core PostgreSQL utilities, specifically pg_dump and pg_restore. Greenmask is designed to align with these native tools, ensuring full compatibility. It independently manages data dumping while delegating schema dumping and restoration to pg_dump and pg_restore, ensuring smooth integration with PostgreSQL’s standard workflow.

Greenmask utilizes the directory format of pg_dump and pg_restore, ideal for parallel execution and partial restoration. This format includes metadata files to guide backup and restoration steps.

Storage Options

Data Anonymization and Validation

Greenmask works with COPY lines, collects schema metadata using the Golang driver, and employs this driver in the encoding and decoding process. The validate command offers a way to assess the impact on both schema (validation warnings) and data (transformation and displaying differences). This command allows you to validate the schema and data transformations, ensuring the desired outcomes during the Anonymization process.

Customization

If your table schema relies on functional dependencies between columns, you can address this challenge using the Dynamic parameters. By setting dynamic parameters, you can resolve such as created_at and updated_at cases, where the updated_at must be greater or equal than the created_at.

If you need to implement custom logic imperatively use Cmd or TemplateRecord or Template transformers.

PostgreSQL Version Compatibility

Greenmask is compatible with PostgreSQL versions 11 and higher.

Links

References