Closed jenlampton closed 1 year ago
One possibility, from the GDPR extension:
"The right to be forgotten, allowing users of CiviCRM to easily anonymise a contact record, hiding any person details but keeping the financial and other history. The action also exists as an API and therefore can be bolted into other processes."
https://civicrm.org/extensions/gdpr
This code/process can probably be adapted to copy the database, run the anonymization process, and dump it. (Process confirmed by @mikeymjco on the CiviCRM Mattermost chat)
On #789 @bugfolder flagged that we'll need to figure this out before we enable CiviCRM on b.org for the first time:
one of the tasks before we start populating Civi with data on the live b.org site is a strategy for sanitization of CiviCRM data for people building local versions of b.org from sanitized data/files. If we distribute sanitized Civi dbs, we'll need to be sure to sanitize any custom data fields we create (e.g., Full Name), in addition to Civi's built-in First/Middle/Last, etc.
I'm unfamiliar with the current process for generating and distributing sanitized b.org data and files; perhaps the CiviCRM side of this can be handled the same way?
In today's meeting we discussed how we could move forward without sanitizing the civi database, since that seems like it may be a monumental task, since pretty much everything in civi is Personally Identifying Information.
We decided that we could just not make the Civi database available to anyone who wants to work on the Backdrop site, and only grant it to those who need to work on Civi -- or who are working on parts of the site that integrate with civi.
In order to do this we may need to disable the Civi module in the sanitized backups for b.org, so that everyone else who's working on the main b.org site won't have any issues when they set things up locally.
Is it possible to share the script that's used to sanitize the b.org db? (a) I'm curious, (b) this might provide a template for doing the same for Civi.
Was this also considered? (sorry I couldn't make the meeting today)
https://github.com/backdrop-ops/backdropcms.org/issues/828#issuecomment-926919186
@laryn we only discussed "how to make this not a blocker" today. We didn't get into possible options of how to get it done yet :)
Is it possible to share the script that's used to sanitize the b.org db?
@bugfolder I think it might already be out there somewhere? I do think we could probably use it as a template for doing the same for Civi. @larsdesigns would know more.
it [sanitization script] might already be out there somewhere?...
Yeah, I was hoping that someone who knows where it is could share it. Presumably either privately or also appropriately sanitized (since it would necessarily contain db &/or other credentials).
My thinking was that since it sounds like we're going to start fairly small in what we collect, there will be relatively few fields that need sanitization, but then as we add more functionality to our Civi install, we can just add the newly affected (or, for custom fields, created) tables to the script.
Well, perhaps we do not make the civicrm database available for download? We could instead sanitize (remove) configuration that requires it from the backdrop.org sanitized database and files.
Unless it is deemed necessary to provide a sanitized CiviCRM database for development reasons. I cannot think of any development reasons though.
@bugfolder This is the project repository that is being used for the sanitization: ~https://github.com/serundeputy/sql-dump-sanitize~ edit: now https://github.com/backdrop-ops/sql-dump-sanitize
Unless it is deemed necessary to provide a sanitized CiviCRM database for development reasons. I cannot think of any development reasons though.
Initially, there probably won't be. However, a plausible scenario where it would be is if we're collecting any CiviCRM fields on the user registration page via a CiviCRM Profile, and we want to develop something else on the user registration page (like anti-spam checks). Then we'd probably need Civi working to provide the profile form on the page.
This isn't a blocker for getting Civi up and running by any means (there's no immediate need for it). We decided at today's meeting that initially we could either disable Civi for local builds or make the actual Civi db available to the small number of devs. Rather, I'm just looking ahead to the time when we will need a sanitized db to do local development.
This is the project repository that is being used for the sanitization...
Thanks, that's what I was looking for!
Getting back in the loop here -- we use https://github.com/scoobird/org.civicrm.contrib.anonymize at Palante to sanitize our Civi databases; perhaps that can be used and/or adapted for our purposes here?
Let's try to get this issue to resolution and/or to a point where it's not blocking progress on #789!
It seems like we've got two proposals so far:
Any thoughts on which approach seems best, either in the short term to remove the blocker or in the long term?
if we have someone available / interested in working on the script, I'd prefer #2. If not, we should go with #1, and updated documentation on how to get a local copy of b.org up and running without the civi database. (And I can help with that documentation)
We're hoping to have a closer look at the script to see if that would work nicely with sanitize.backdropcms.org, and if so, make the civi database available there too. @larsdesigns has volunteered to review the script for is. Thank you!
@BWPanda, would you be interested in collaborating with me on this?
@larsdesigns Possibly. What do you need?
@BWPanda, Add you as a reviewer when I open a PR?
Backing up and sanitizing are both done by functions in /home/backdrop/sanitized_databases, so sanitization should probably be addressed together with backing up, which is https://github.com/backdrop-ops/backdropcms.org/issues/963.
Robert, thank you so much for taking this on.
Handing this off to @bugfolder.
@larsdesigns, @jenlampton, I have created a PR to the sql-dump-sanitize
repo that adds both backing up and sanitization of the CiviCRM db.
It uses four new config.ini
values of the form *_CIVI
, which you can see in the current config.ini
on b.org (which I've also updated). The current (old) script on b.org ignores those new values, so should still run. But if we update b.org to the new script, it should pick up those values and backup/sanitize the CiviCRM database, putting its sanitized backups in a new folder, sanitized_civi
(parallel to the existing folder sanitized
).
I have tested this script on a local setup, and it works. So, after you've reviewed the code, I'd like to try out the new script on b.org.
I think the script will still work on the non-CiviCRM properties (e.g., docs, forum, events); we just don't include the *_CIVI
values in their respective config.ini files, and no CiviCRM backups will be attempted.
A note on sanitization strategy. I modified the sanitization of Backdrop account emails to be "user+$uid@localhost"
, so that I could easily ensure that sanitized CiviCRM email addresses in the civicrm_address
and civicrm_uf_match
tables were the same where appropriate.
So when you get a chance, please take a look and let me know what you think. (And happy 4th day of post-solstice!)
@bugfolder this looks fantastic. I added one request for a change to the PR (just to update or remove an inline code comment) but that can safely be ignored :) Thank you for working on this!
Change made. A higher power than me is needed to merge the PR ;o).
PR merged :D
Sanitized dbs are being created and are exposed on the sanitize.backdropcms.org site. Calling this one done.
@bugfolder ++, Nice work! Thank you so much for getting this done.
We'll need to establish a process for working with the backdropcms.org site locally, that will include disabling civi in the normal daily sanitized backups, as well as coming up with a way to sanitize the Civi database for those who do need to do work with Civi.