deliciousbrains / wp-migrate-db-anonymization

An addon for WP Migrate DB Pro that anonymizes user data
https://wordpress.org/plugins/wp-migrate-db-anonymization/
45 stars 7 forks source link

Config::clean_table not handling Multisite tables correctly #28

Open tyrann0us opened 5 years ago

tyrann0us commented 5 years ago

Config::clean_table() gets a table's name without WordPress' table prefix or WP Migrate DB's _mig_ prefix: https://github.com/deliciousbrains/wp-migrate-db-anonymization/blob/cca4ad85ca9d73b5bb2ae76347a0b6d3f3c5913d/includes/Config/Config.php#L43-L45

For e.g. wp_usermeta or _mig_wp_usermeta it returns usermeta correctly. In Multisite, however, the site's ID is prefixed. This is because $wpdb->base_prefix returns the "original" table prefix. But even $wpdb->prefix would not work because the migration runs in the main site and it wouldn't contain an ID either. This does not affect the default config, but if you extend it (using the wpmdb_anonymization_config filter) to anonymize columns (fields) in tables created for each site, it will fail to match these tables.

// Callback for `wpmdb_anonymization_config` filter.
$config['foobar'] = [
  'baz' => [
    'fake_data_type' => 'userName',
  ],
];
return $config;

Given the tables wp_foobar, wp_2_foobar, wp_3_foobar etc. this won't work except for wp_foobar because clean_table() will return 2_foobar for wp_2_foobar and 3_foobar for wp_3_foobar etc. so has_table() cannot match it to foobar.

tyrann0us commented 5 years ago

It would be possible to use preg_replace():

public function clean_table( $table ) {
  return preg_replace(
    [
      '/' . $this->wpdb->prefix . '([0-9]+_)?/',
      '/_mig_/',
    ],
    '',
    $table
  );
}

The problem is that theoretically there could be a table called wp_1234_foobar whose prefix 1234 would have nothing to do with the site ID, but which would also be matched (false positive).

The best thing would be to get an array of all valid site IDs with something like this:

// Empty array value is main site.
[''] + get_sites( 
  [
    'fields' => 'ids',
  ]
);

and then somehow "loop" over this in clean_table(). What's problematic is that clean_table() is executed on each field, so a get_sites() call must be cached to avoid massive performance degradation.

@polevaultweb, thoughts about this?