apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.59k stars 1.41k forks source link

Parquet Data Masking for Column Encryption #1652

Open asfimport opened 1 year ago

asfimport commented 1 year ago

Background

What is Data Masking?

Data masking is a technique used to protect sensitive data by replacing it with modified or obscured values. The purpose of data masking is to ensure that sensitive information, such as Personally Identifiable Information (PII), remains hidden from unauthorized users while allowing authorized users to perform their tasks.

Here are a few key points about data masking:

Reporter: Jiashen Zhang / @zhangjiashen

Note: This issue was originally created as PARQUET-2223. Please see the migration documentation for further details.

asfimport commented 1 year ago

Jiashen Zhang / @zhangjiashen: PARQUET-2223: Parquet Data Masking Feature

https://docs.google.com/document/d/1K0juUXOg0wWXBlTSBiZYs4snCKeY5MsoVYxWEixL2qU/edit#heading=h.1kzvc0jyhlyb

asfimport commented 1 year ago

Gang Wu / @wgtmac: I am new to the discussion so I may miss something here. Should we get a consensus on the design before reviewing the code? [~Jiashen Zhang] [~xinlishang] @ggershinsky  

asfimport commented 1 year ago

Gidon Gershinsky / @ggershinsky: Yep, I also think so. I'll have a look at the current version of the design document.