delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.57k stars 1.7k forks source link

[Feature Request][Other] Encrypted Delta Lakes #2269

Open bhoberman opened 12 months ago

bhoberman commented 12 months ago

Feature request

Which Delta project/connector is this regarding?

Overview

I want to propose is an encrypted delta lake, where the data that is written to and read from file storage are encrypted at the file format level.

Motivation

I want to facilitate performant analytics in a situation where compute agents (e.g., Spark) can be trusted (because they run on-prem, for example) but the underlying storage cannot (because server-side encryption is inadequate).

Further details

I think the way Iceberg does this is well-thought-out, and I propose we do the same here. Iceberg provides an encryption API that defines a set of interfaces that encryption schemes adhere to, and they provide a simple AES-GCM streaming encryption/decryption implementation which operates on Avro data and metadata files.

Because Delta Lake is a Parquet-only lakehouse format, I propose using Parquet-native encryption for data files and a manual encryption solution for metadata fiels/transaction logs. I'd want to:

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

I'm also potentially willing to implement this functionality for delta-rs.

samzys commented 4 months ago

hi, any news on this topic ?