Open darkblue-b opened 6 months ago
What would be the purpose of switching to an alternative implementation for Parquet reading ? Is it related to the discussion on the lack of libarrow/libparquet Debian packaging in offficial Debian repositories? But libarrow/libparquet is still packaged in an Apache APT repository, so it is not that bad
Regarding the listed alternatives:
All in all, nothing obvious to me that would justify making the effort to develop a new implementation of the OGR Parquet driver. libarrow/libparquet is in my perception the reference implementation, is actively developed and maintained, and is feature full.
@darkblue-b given your apparent recent success in building GDAL with Arrow support, do you still think this is desirable?
Feature description
the parquet data format is increasingly popular; existing GDAL-OGR code[0] relies on Apache Arrow libs to ingest parquet .
There exists a pure-python alternate
fastparquet
[1] also known aspython-parquet
. The only unusual library dependency for fastparquet is namedcramjam
[2].Enhancement -- consider adding
fastparquet
as an alternate parquet reader implementation in GDAL-OGR.Other implementations of parquet readers include Apache Polars[3] and DuckDB[4][build]
[0] https://github.com/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/parquet/CMakeLists.txt
[1] https://pypi.org/project/fastparquet/ [2] https://github.com/milesgranger/cramjam [3] https://pola.rs/ [4] https://github.com/duckdb
Additional context
No response