LSSTDESC / tables_io

A small package to provide tools to read / write and convert tabular data for DESC
MIT License
1 stars 1 forks source link

Initial commit for pyarrow support #93

Closed sidneymau closed 3 months ago

sidneymau commented 3 months ago

Problem & Solution Description (including issue #)

This PR adds support for pyarrow for

  1. IO with parquet files (pyarrow.parquet and pyarrow.dataset)
  2. In-memory tabular representation (pyarrow.Table)

There is a lot of overlap between this and pandas (as pandas uses pyarrow as a backend), but in principle pyarrow offers much better scaling for, e.g., reading/writing data in batches, streamed computations (pyarrow.acero), etc., so it seems worth having.

Code Quality

sidneymau commented 3 months ago

A few notes:

sidneymau commented 3 months ago

Linking this PR to #66