apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.27k stars 3.47k forks source link

[Python][Parquet] Using decryption_properties parameter on pq.ParquetDataset raises exception #41273

Open tritzman opened 4 months ago

tritzman commented 4 months ago

Describe the bug, including details regarding any error messages, version, and platform.

pyarrow 15.0.2

Changing the read method for example at python/examples/parquet_encryption/sample_vault_kms_client.py from this: result = pq.ParquetFile(path,decryption_properties=file_decryption_properties)

to this: result = pq.ParquetDataset(path,decryption_properties=file_decryption_properties)

results in this exception: __init__() got an unexpected keyword argument 'decryption_properties'

Tracking it down, Parquet.ParquetDataset supports the parameter decryption_properties. The decryption_properties are passed through __init__ where it’s collected in the read_options. The read_options are passed to ds.ParquetFileFormat (on line 1337 of core.py). The decryption_properties are not accepted by ds.ParqetFileFormat, which raises the exception.

I was looking at https://github.com/apache/arrow/issues/29238, thinking the changes in v14.0.0 might have affected the way decryption properties are used with datasets (vs files). It's unclear to me how one uses decrpytion_properties with a dataset.

Component(s)

Parquet, Python

heyuqi1970 commented 1 month ago

pyarrow 16.0.0 macos 11.7.10 (20G1427) python 3.9.7 result = pq.ParquetDataset(path,decryption_properties=file_decryption_properties)

return same exception