apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.46k stars 729 forks source link

Reduce Copying of RowGroupMetaData and FileMetaData #2530

Open tustvold opened 2 years ago

tustvold commented 2 years ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

SerializedFileReader and friends currently perform a lot of copying of RowGroupMetadata and FileMetadata, whilst reading indexes and filtering row groups. Neither of these are small or particularly cheap to clone.

Describe the solution you'd like

I would like to reduce the amount of unnecessary work being performed, either by altering the interfaces so data can be moved instead of cloned, or using Arc

Describe alternatives you've considered

Additional context

alamb commented 2 weeks ago

These structures are now Arc'd when I have run across them in the code. I am not sure if this is an issue anymore

tustvold commented 2 weeks ago

One example, but I suspect there are others

https://github.com/apache/arrow-rs/blob/master/parquet/src/file/serialized_reader.rs#L197