10up / wp-scrubber

BETA: This plugin provides a command-line interface for scrubbing sensitive user and comment data from a WordPress installation.
GNU General Public License v2.0
15 stars 3 forks source link

JSON Configuration #11

Open darylldoyle opened 5 months ago

darylldoyle commented 5 months ago

Is your enhancement related to a problem? Please describe.

PII and other sensitive data can be stored throughout a WordPress database. Whilst this does a good job of focusing on Users and Comments, it requires engineers to hook into the scrubbing process and manually scrub the data on each project.

Allowing engineers to set up a wp-scrubber.json file, outlining the data and fields that need to be scrubbed could be a very easy way to ease the barrier of entry for projects.

Designs

My idea for the structure would look something like this:

  1. Post Types

    • name: Identifies the post type (e.g., post, page, custom post types).
    • fields: Lists the fields within the post type for scrubbing.
    • post_meta: Specifies post_meta fields and actions.
  2. Taxonomies

    • name: Specifies the taxonomy (e.g., category, tag).
    • terms: Defines terms within the taxonomy for scrubbing.
    • term_meta: Details term_meta fields and scrubbing actions.
  3. Options

    • Lists WordPress options (e.g., admin_email, API Keys etc) for scrubbing.
  4. User Data

    • Covers user data fields (e.g., user_email, display_name) for scrubbing.
  5. Custom Tables

    • name: Names of custom database tables.
    • columns: Specifies columns in these tables for scrubbing.
  6. Truncate Tables

    • Lists tables for complete truncation.

Each section of fields (fields, post_meta, columns etc) above would hold an array of object which have the following properties:

Put together, this would look something like:

{
    "post_types":
    [
        {
            "name": "post",
            "fields":
            [
                {
                    "name": "post_title",
                    "action": "faker",
                    "faker_type": "sentence"
                }
            ],
            "post_meta":
            [
                {
                    "key": "pii_containing_meta",
                    "action": "replace",
                    "value": "this string doesn't contain PII"
                }
            ]
        }
    ],
    "taxonomies":
    [
        {
            "name": "custom_taxonomy",
            "terms":
            [
                {
                    "name": "term_name",
                    "action": "faker",
                    "faker_type": "word"
                }
            ],
            "term_meta":
            [
                {
                    "key": "pii_containing_meta",
                    "action": "replace",
                    "value": "this string doesn't contain PII"
                }
            ]
        }
    ],
    "options":
    [
        {
            "name": "google_maps_api_key",
            "action": "remove"
        }
    ],
    "user_data":
    [
        {
            "name": "user_name",
            "action": "faker",
            "faker_type": "name"
        },
        {
            "name": "user_email",
            "action": "faker",
            "faker_type": "email"
        }
    ],
    "custom_tables":
    [
        {
            "name": "wp_registration_log",
            "columns":
            [
                {
                    "name": "email",
                    "action": "faker",
                    "faker_type": "email"
                },
                {
                    "name": "IP",
                    "action": "faker",
                    "faker_type": "ipv4"
                }
            ]
        }
    ],
    "truncate_tables":
    [
        "gf_entry",
        "gf_entry_meta",
        "gf_entry_notes",
        "gf_form_view"
    ]
}

Describe alternatives you've considered

I considered other formats, such as YAML, but they're a lot less human-readable than JSON, which is why we went with that approach.

I think this approach gives us the most amount of structure and flexibility, outside of defining out own scrubbing code for each project.

Code of Conduct

tlovett1 commented 5 months ago

Absolutely love this.