-
It would be nice to have options for compression. Looks like there is no compression by default?
```
parq RS_2008-04.parquet
# Metadata
created_by: parquet-cpp-arrow version 7.0.0
n…
-
### What is the bug?
According to [docs](https://gdal.org/en/latest/drivers/vector/parquet.html#dataset-partitioning-read-support) and [ogr_parquet.py](https://github.com/OSGeo/gdal/blob/master/aut…
-
once the files are read into memory, what do you think about caching them as parquet files instead of shapefiles? would make IO much faster and the footprint a lot smaller on disk. If you're into it, …
-
### What happens?
I am using httpfs to query parquet files from Hugging Face.
Hugging Face initially returns a 302 redirect for the download file which DuckDB tries to follow. However DuckDB URL d…
-
When I'm trying to copy data from local Parquet file to the database I get the following error:
```
type mismatch for column "bla" between table and parquet file. table expected "Utf8" but file h…
-
In the current implementation, the `__call__` method of `nemo_curator/modules/fuzzy_dedup.py`, it assumes that at least one LSH duplicate will be found, and the results will be saved as a parquet file…
-
### Describe the bug
Our ML pipelines run on large instances in k8s cluster often large number of cpus is available 128/96 cpus. Looks like the new swordfish runtime has issues handling and schedul…
-
https://github.com/apache/parquet-java/blob/master/parquet-column/src/test/java/org/apache/parquet/column/impl/TestColumnReaderImpl.java#L55
-
File "Show-o/parquet/refinedweb_dataset.py", line 20, in
from parquet.parquet_dataset import CruiseParquetDataset
ModuleNotFoundError: No module named 'parquet.parquet_dataset'
-
### What happens?
I'm aggregating 4B records with duckdb.
```
SELECT
channel_id,
cast(to_timestamp(end_time) as date) AS day,
sum(end_time - start_time) AS total_…