Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.86k stars 2.94k forks source link

S3 UFS Client Side Encryption (CSE) #10920

Closed broussea1901 closed 3 years ago

broussea1901 commented 4 years ago

S3 provides multiple option to encrypt data at rest: Server-Side Encryption (SSE): request S3 servers to encrypt objects before saving them on disks and then decrypt it when downloading objects. Client-Side Encryption (CSE): encrypt data on client-side and upload encrypted data to S3. Client manage encryption process and encryption keys, and KMS. No one can read data having only access to S3 servers. For some use cases, due to regulation, CSE is the only option to ensure total independence from S3 hoster/admin (cloud or on-prem).

Server-Side Encryption (SSE) is available https://docs.alluxio.io/os/user/stable/en/ufs/S3.html#enabling-server-side-encryption and there seems to be a work in progress on SSE: https://alluxio.atlassian.net/browse/ALLUXIO-3228 It would be very handy to have access to client side encryption (CSE) when mounting S3 UFS. CSE-K where keys are provided/granted by KMS server would be the preferred option for best security of key management. (fs.s3.cse.enabled, fs.s3.cse.encryptionMaterialsProvider, fs.s3.cse.kms.keyId)

At the moment we are investigating a way to encrypt data before writing it to disk and after retrieving it from Alluxio to mimic HDFS TSE behaviour without intervention of Alluxio. This option is probably not a good one when aiming new Alluxio Structured Data Management (https://www.alluxio.io/blog/serving-structured-data-in-alluxio-concept/) where Allxuio needs to be able to "understand" data structure and not simply serve "blocks".

This is a nice to have feature in general, but it's mandatory when storing highly sensitive content on S3 (even on-prem for some sensitive use case).

Some ref: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html https://github.com/awsdocs/amazon-emr-management-guide/blob/master/doc_source/emr-emrfs-encryption-cse.md https://aioboto3.readthedocs.io/en/latest/cse.html https://prestosql.io/docs/current/connector/hive.html

yuzhu commented 3 years ago

no plan to support encryption in the near tearm