GreenmaskIO / greenmask

PostgreSQL database anonymization tool
https://greenmask.io
Apache License 2.0
667 stars 14 forks source link
anonymization deterministic dump golang masking obfuscation obfuscator postgresql restore s3 security security-tools staging synthetic-data transform

Greenmask - dump obfuscation tool

Preface

Greenmask is a powerful open-source utility that is designed for logical database backup dumping, obfuscation, and restoration. It offers extensive functionality for backup, anonymization, and data masking. Greenmask is written entirely in pure Go and includes ported PostgreSQL libraries, making it platform-independent. This tool is stateless and does not require any changes to your database schema. It is designed to be highly customizable and backward-compatible with existing PostgreSQL utilities.

Features

Use Cases

Greenmask is ideal for various scenarios, including:

Our purpose

The Greenmask utility plays a central role in the Greenmask ecosystem. Our goal is to develop a comprehensive, UI-based solution for managing obfuscation procedures. We recognize the challenges of maintaining obfuscation consistency throughout the software lifecycle. Greenmask is dedicated to providing valuable tools and features that ensure the obfuscation process remains fresh, predictable, and transparent.

General Information

It is evident that the most appropriate approach for executing logical backup dumping and restoration is by leveraging the core PostgreSQL utilities, specifically pg_dump and pg_restore. Greenmask has been purposefully designed to align with PostgreSQL's native utilities, ensuring compatibility. Greenmask primarily handles data dumping operations independently and delegates the responsibilities of schema dumping and restoration to pg_dump and pg_restore, maintaining seamless integration with PostgreSQL's standard tools.

Backup Process

The process of backing up PostgreSQL databases is divided into three distinct sections:

Greenmask focuses exclusively on the data section during runtime. It delegates the handling of the pre-data and post-data sections to the core PostgreSQL utilities, _pgdump and _pgrestore.

Greenmask employs the directory format of _pgdump and _pgrestore. This format is particularly suitable for parallel execution and partial restoration, and it includes clear metadata files that aid in determining the backup and restoration steps. Greenmask has been optimized to work seamlessly with remote storage systems and obfuscation procedures.

When performing data dumping, Greenmask utilizes the COPY command in TEXT format, maintaining reliability and compatibility with the vanilla PostgreSQL utilities.

Additionally, Greenmask supports parallel execution, significantly reducing the time required for the dumping process.

Storage Options

The core PostgreSQL utilities, _pgdump and _pgrestore, traditionally operate with files in a directory format, offering no alternative methods. To meet modern backup requirements and provide flexible approaches, Greenmask introduces the concept of Storages.

Restoration Process

In the restoration process, Greenmask combines the capabilities of different tools:

Greenmask also supports parallel restoration, which can significantly reduce the time required to complete the restoration process. This parallel execution enhances the efficiency of restoring large datasets.

Data Obfuscation and Validation

Greenmask works with COPY lines, collects schema metadata using the Golang driver, and employs this driver in the encoding and decoding process. The validate command offers a way to assess the impact on both schema (validation warnings) and data (transformation and displaying differences). This command allows you to validate the schema and data transformations, ensuring the desired outcomes during the obfuscation process.

Customization

If your table schema relies on functional dependencies between columns, you can address this challenge using the TemplateRecord transformer. This transformer enables you to define transformation logic for entire tables, offering type-safe operations when assigning new values.

Greenmask provides a framework for creating your custom transformers, which can be reused efficiently. These transformers can be seamlessly integrated without requiring recompilation, thanks to the PIPE (stdin/stdout) interaction.

Furthermore, Greenmask's architecture is designed to be highly extensible, making it possible to introduce other interaction protocols, such as HTTP or Socket, for conducting obfuscation procedures.

PostgreSQL Version Compatibility

Greenmask is compatible with PostgreSQL versions 11 and higher.

References

Links