Open AudriusButkevicius opened 6 months ago
Seems you can rebuild the dataset from what parquet_dataset returned:
from pyarrow import fs
filesystem = fs.LocalFileSystem()
remade_dataset = ds.FileSystemDataset(
[
pformat.make_fragment(
fragment.path,
filesystem,
fragment.partition_expression,
[rg.id for rg in fragment.row_groups]
)
for fragment in dataset.get_fragments()
],
dataset.schema,
pformat,
)
print(remade_dataset.to_table())
but I assume this re-fetches the metadata (instead of using it from the _metadata file), beating the purpose of having the _metadata file in the first place.
Actually, I think the issue might be with writing the data. I think the _metadata file has no encryption algorithm set, so it doesn't even attempt to decrypt the metadata.
Describe the bug, including details regarding any error messages, version, and platform.
Fails with:
This is using plaintext footer.
Reproducer:
Presumably the metadata read out of _metadata file is not decrypted or the footer indicates incorrectly whether it's encrypted or not.
Tried with latest master which contains: https://github.com/apache/arrow/commit/bd444106af494b3d4c6cce0af88f6ce2a6a327eb
Component(s)
C++, Python