apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.68k stars 3.56k forks source link

Modular Encryption prints encryption key in error message, in case using wrong key for encryption #35919

Open seb2704 opened 1 year ago

seb2704 commented 1 year ago

Describe the bug, including details regarding any error messages, version, and platform.

Hi I'm trying to encrypt and decrypt a pyarrow table with a inmemory kmsclient. Everything works fine, but when i use the wrong key to decrypt the data, i get a a error message that includes the original encryption key. Here is my Code example:

FOOTER_KEY = b"0123456789112340"
FOOTER_KEY_NAME = "footer_key"
COL_KEY = b"0123456789112333"
COL_KEY_NAME = "col_key"
key = b'this_is_a_32_byte_key_for_AES256_CBC'

d = {'a': [1, 2], 'b': [3, 4]}
df = pd.DataFrame(data=d)

table = pa.Table.from_pandas(df)

kms_connection_config = pe.KmsConnectionConfig(
    custom_kms_conf={
        FOOTER_KEY_NAME: FOOTER_KEY.decode("UTF-8"),
        COL_KEY_NAME: COL_KEY.decode("UTF-8"),
    }
)

#b'\xe1\x02l\x81\x7f\xfcVV\x14 S]`\xb3J\xd5'
def kms_factory(kms_connection_configuration):
    return InMemoryKmsClient(kms_connection_configuration)
crypto_factory = pe.CryptoFactory(kms_factory)

encryption_config = pe.EncryptionConfiguration(
            footer_key=FOOTER_KEY_NAME,
            column_keys={
                COL_KEY_NAME : df.columns,
            },
            encryption_algorithm="AES_GCM_V1",
            data_key_length_bits=256)

file_encryption_properties = crypto_factory.file_encryption_properties(
    kms_connection_config, encryption_config)
with pq.ParquetWriter(
        'test.parquet', table.schema,
        encryption_properties=file_encryption_properties) as writer:
    writer.write_table(table)

decryption_config = pe.DecryptionConfiguration()

FOOTER_KEY2 = b"0123456123122342"
COL_KEY2 = b"0123456789112343"
kms_connection_config2 = pe.KmsConnectionConfig(
    custom_kms_conf={
        FOOTER_KEY_NAME: FOOTER_KEY2.decode("UTF-8"),
        COL_KEY_NAME: COL_KEY2.decode("UTF-8"),
    }
)
file_decryption_properties = crypto_factory.file_decryption_properties(
        kms_connection_config2, decryption_config)

result = pq.ParquetFile(
        'test.parquet', decryption_properties=file_decryption_properties)
df = table.to_pandas(result.read(use_threads=True))

Error message: ValueError: ('Incorrect master key used', b'0123456789112340', b'\xaf\xf7\x15\x073\xec/"Z\xe3 \x86H\xf6\x8cP')

I'm using windows 10, Python 3.9 and pyarrow=12.0.0.

Am I doing something wrong? Or is this a bug?

Component(s)

Parquet, Python

westonpace commented 1 year ago

The class that is generating this error is InMemoryKmsClient which is located in pyarrow/tests/parquet/encryption.py. How are you importing this in the first place? My guess is that this class is a test utility and not intended for production code.

westonpace commented 1 year ago

Also, the docstring for this class is "This is a mock class implementation of KmsClient, built for testing only."