Itguru14 / tag-dssg-2023-lbc

MIT License
1 stars 1 forks source link

Mask PII in columns containing PII #3

Open bbrewington opened 1 year ago

bbrewington commented 1 year ago
  1. Define list of columns containing PII, and types (Phone Number, Name, Address)
  2. Write code to do the masking (Python or SQL?)
  3. Perform masking and refresh data

Might be a good idea to do this as SQL views, landing the masked data in a BigQuery dataset we would give people access to. Then could maybe set up Google Sheets that query those views (so whenever changes are made, it's propagated instantly)

bbrewington commented 1 year ago

Conversation in Slack: Screenshot 2023-07-17 094240

Itguru14 commented 1 year ago
  1. Define list of columns containing PII, and types (Phone Number, Name, Address)
  2. Write code to do the masking (Python or SQL?)
  3. Perform masking and refresh data

Might be a good idea to do this as SQL views, landing the masked data in a BigQuery dataset we would give people access to. Then could maybe set up Google Sheets that query those views (so whenever changes are made, it's propagated instan

Itguru14 commented 1 year ago

uploaded masking py file mask-data.py. this file could be used to carry out the masking that is asked for in this issue, however we need to know the exact forms of the Name, Phone Number and Address columns in other to construct the appropriate regex to match those columns, because I only have the PII stripped version of data its not possible to get this info from the data file distributed.

Itguru14 commented 1 year ago
  1. Define list of columns containing PII, and types (Phone Number, Name, Address)
  2. Write code to do the masking (Python or SQL?)
  3. Perform masking and refresh data

Might be a good idea to do this as SQL views, landing the masked data in a BigQuery dataset we would give people access to. Then could maybe set up Google Sheets that query those views (so whenever changes are made, it's propagated instantly)

  1. Define list of columns containing PII, and types (Phone Number, Name, Address)
  2. Write code to do the masking (Python or SQL?)
  3. Perform masking and refresh data

uploaded masking py file mask-data.py. this file could be used to carry out the masking that is asked for in this issue, however we need to know the exact forms of the Name, Phone Number and Address columns in other to construct the appropriate regex to match those columns, because I only have the PII stripped version of data its not possible to get this info from the data file distributed.

bbrewington commented 1 year ago

@Itguru14 here's some phone Number patterns I'm seeing in Salesforce Opportunity.Description (and there can be multiple occurrences in a single cell)

555.867.5309
555-867-5309
(555) 867-5309
5 5, 58675309 <-- this one may have been speech to text or something
bbrewington commented 1 year ago

Per conversation w/ Joey, I'm adding myself as owner and he's going to work on this collaborating with Adeseye (I might be handing off to Adeseye fully)

Itguru14 commented 1 year ago

I can reach out to Joey to add him also as collaborator or you can send me his email address. I didnt get a chance to work on the regex because I have been busy with setting up Tableau and Tableau prep. I will work on it today.

On Thu, Jul 20, 2023 at 12:14 PM Brent Brewington @.***> wrote:

Per conversation w/ Joey, I'm adding myself as owner and he's going to work on this collaborating with Adeseye (I might be handing off to Adeseye fully)

— Reply to this email directly, view it on GitHub https://github.com/Itguru14/tag-dssg-2023-lbc/issues/3#issuecomment-1644210134, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASUAXVFAMJXFSKQFPZCWNFLXRFKMVANCNFSM6AAAAAA2M63SFA . You are receiving this because you were mentioned.Message ID: @.***>