DekodeInteraktiv / anonymize-mysqldump

Allows you to pipe data from mysqldump or an SQL file and anonymize it.
GNU General Public License v3.0
17 stars 9 forks source link

Duplicate Key Bug #17

Closed PeterBooker closed 10 months ago

PeterBooker commented 2 years ago

It is possible for the faker to make more than one users email the same email address, resulting in a duplicate key error when importing the anonymized database.

We should find a way to ensure that anonymized keys are unique.

PeterBooker commented 2 years ago

We can test with the kirkensbymisjon project, which has over 1000 users.

PeterBooker commented 2 years ago

I found that we can pass a seed to the existing faker library, however I need to test that this actually results in enough unique email addresses for our needs. If not, I will need to keep track of emails generated and ensure that they are unique manually.

PeterBooker commented 11 months ago

This issue came up again for another client. Update to the above comment passing a seed is not enough, we will need to use another faker library which supports unique values or keep track of emails manually, to ensure that they are unique.