awslabs / aws-serverless-data-lake-framework

Enterprise-grade, production-hardened, serverless data lake on AWS
https://sdlf.workshop.aws/
MIT No Attribution
405 stars 136 forks source link

PII data #18

Closed rubenssoto closed 3 years ago

rubenssoto commented 3 years ago

Hello,

Thank you so much for answering all my questions. I'm creating a pipeline like yours, so, in my pipe I add transformations to hash PII data, but there are some people in my company that need to view PII data. So I'm trying to make some area in my datalake with all PII data and only some people have access to this data.

My difficulty is how to create this "translation" area using this archteture.

Do you have some problems like this? Thank you.

jaidisido commented 3 years ago

Hi - Have you considered leveraging AWS Lake Formation to control permissions to your data and tables at a column level? Look at the "Enforcing column-level security in Lake Formation" section in this AWS blog: https://aws.amazon.com/blogs/big-data/enforce-column-level-authorization-with-amazon-quicksight-and-aws-lake-formation/

You could, for example, store your non-hashed data in one area of the data lake (e.g. Raw) and then apply the transformations to hash and move it to the Stage or Analytics area. Using Lake Formation permissions (and IAM if the users need S3 access), you could then limit access to the right personas.