Closed sidneymau closed 3 months ago
A few notes:
NotImplementedError
) never runs, for example.parquet
and pandas when .pq
/.parq
. Not sure the best way to resolve this... (I also would opine that it would be easier to only support .parquet
as that tends to be the preferred extension...)Linking this PR to #66
Problem & Solution Description (including issue #)
This PR adds support for pyarrow for
pyarrow.parquet
andpyarrow.dataset
)pyarrow.Table
)There is a lot of overlap between this and pandas (as pandas uses pyarrow as a backend), but in principle pyarrow offers much better scaling for, e.g., reading/writing data in batches, streamed computations (
pyarrow.acero
), etc., so it seems worth having.Code Quality
#pragma: no cover
; in the case of a bugfix, a new test that breaks as a result of the bug has been added