Open alamb opened 3 years ago
Parallel column is increasing the performance. I checked by creating multiple ParquetRecordBatchStream<Reader>
for each column and all ParquetRecordBatchStream<Reader>
reading parallely, performance increase from 60Mega Bytes / sec (when single ParquetRecordBatchStream<Reader>
for a file) to 200 Mega Bytes / sec. Reading a Parquet files, all contains 8 columns and of size of approximately 250MB.
Instead of creating multiple readers can it be done by single reader?
Instead of creating multiple readers can it be done by single reader?
That is an interesting question @Sach1nAgarwal -- I think @tustvold has some ideas on parallelized decode but I am not sure how concrete they are
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-6712
What's the best way to read a .parquet file into a rust ndarray structure?
Can it be effective with the current API? I assume row iteration is not the best idea :)
I can imagine that even parallel column loading would be possible.