Smile-SA / gdpr-dump

Utility that creates anonymized database dumps (MySQL only). Provides default config templates for Magento, Drupal and Shopware.
GNU General Public License v3.0
182 stars 47 forks source link

Persistent cache #130

Open bloep opened 7 months ago

bloep commented 7 months ago

Is your feature request related to a problem? Please describe.

Scenario 1:

We have two databases that contain partially the same data because the corresponding systems communicate with each other via APIs and exchange this data (e.g. names and addresses).

Example goal: Person X with first name "Richard Roe" will become "John Doe" in both dumps.

Scenario 2:

A dump is automatically created every night for the development. Different anonymized values should not be generated every night for the same original values.

Example goal: Person X with the first name "Richard Roe" becomes "John Doe" today and "John Doe" tomorrow and not "Foo Bar".

In both cases, the resulting data for the same input is different.

Describe the solution you'd like A possibility to persist the cache entries in a cache file. Then a next execution would already find the cached value and return the same result for the same input.

guvra commented 6 months ago

You could implement this with by adding a file parameter in the cache converter, but it's really not trivial to implement, the cache may contain billions of values when dumping a big database.

staabm commented 6 months ago

a php-based cache file, which dumps the output of var_export and re-reads it using a require should be pretty effiecient even with huge arrays (and would also prevent the need of e.g. json encoding/decoding or whatever format you use instead).

similar as done here:

guvra commented 6 months ago

a php-based cache file, which dumps the output of var_export and re-reads it using a require should be pretty effiecient even with huge arrays (and would also prevent the need of e.g. json encoding/decoding or whatever format you use instead).

similar as done here:

* cache write: https://github.com/staabm/phpstan-dba/blob/e86594d4e0d7c868c9dfbdbeda5805b93a4ca6ce/src/QueryReflection/ReflectionCache.php#L212-L219

* cache read: https://github.com/staabm/phpstan-dba/blob/e86594d4e0d7c868c9dfbdbeda5805b93a4ca6ce/src/QueryReflection/ReflectionCache.php#L147

But it's extremely insecure, it's okay with phpstan because it's a development tool, gdpr-dump can be executed on production environments.