feat(table): Initial implementation of Reading Data

zeroshade commented 3 weeks ago

This provides an initial implementation for reading data as Arrow record batches or tables from iceberg tables. The data is parallelized and streamed. The records are pulled in a consistent ordering and allows for filtering and row limits.

The underlying interactions with the file are abstracted behind interfaces in an internal package to allow for future additions of handling ORC and Avro files in addition to the Parquet implementation.

This PR also includes the addition of integration tests to ensure the reads are working properly.

zeroshade commented 3 weeks ago

CC @Fokko @nastra

Fokko commented 1 week ago

Let's get this in, thanks again @zeroshade

apache / iceberg-go

feat(table): Initial implementation of Reading Data #185