This provides an initial implementation for reading data as Arrow record batches or tables from iceberg tables. The data is parallelized and streamed. The records are pulled in a consistent ordering and allows for filtering and row limits.
The underlying interactions with the file are abstracted behind interfaces in an internal package to allow for future additions of handling ORC and Avro files in addition to the Parquet implementation.
This PR also includes the addition of integration tests to ensure the reads are working properly.
This provides an initial implementation for reading data as Arrow record batches or tables from iceberg tables. The data is parallelized and streamed. The records are pulled in a consistent ordering and allows for filtering and row limits.
The underlying interactions with the file are abstracted behind interfaces in an
internal
package to allow for future additions of handling ORC and Avro files in addition to the Parquet implementation.This PR also includes the addition of integration tests to ensure the reads are working properly.