apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.59k stars 3.54k forks source link

[C++][Dataset][Python] Improve ParquetFileFragment serialization #26413

Open asfimport opened 4 years ago

asfimport commented 4 years ago

After ARROW-10131 ParquetFileFragment wraps a FileMetaData, from which all its properties are queried. FileMetaData is emminently serializable, so when pickling a fragment with pre-loaded metadata it would save redundant IO to just serialize the metadata. (An unpickled fragment would then also have pre-loaded metadata.)

https://github.com/apache/arrow/pull/8507#discussion_r512698380

Reporter: Ben Kietzman / @bkietz

Note: This issue was originally created as ARROW-10435. Please see the migration documentation for further details.

asfimport commented 4 years ago

Ben Kietzman / @bkietz: @jorisvandenbossche