-
**Describe the bug**
Daft's local Parquet reader is slow when reading Parquet files with many small rowgroups. The Polars Parquet writer currently writes files like that (attached a sample file for…
-
Not too familiar with refine but could have a look at adding this if someone gives me a starting point.
-
### Describe the bug
In reference to this [issue](https://github.com/aws/aws-sdk-pandas/issues/1110), it appears we are still unable to run copy_from_files when attempting to copy parquet data into a…
-
fb64 updated
2 months ago
-
This might sound crazy but still I wanted to propose a feature request about parquet files.
You might ask, why? Parquet files are becoming more widespread and might even be considered as "the new …
-
## Summary
Chronologically older partitions should go to an object store to scale and be more cost efficeint. At the same time, data in these partitions should remain accessible, although with a perf…
-
I'm trying to execute the code below on Windows (I'm on Windows 11):
```r
# sample keys
key1
-
### What happened + What you expected to happen
I have code that reads parquet data using ray data, then split data by some rules and write it back using ray data.
Please see the code of writing d…
-
Thank you for your great work!
I have a problem after downloading data from hugging face. There are duplicate image uids in different parquet files. Do these duplicate uids point to the same image?
-
### Describe the bug
When a column has data type in `Dictionary`, the parquet metadata statistics returns `Exact(Dictionary(Int32, Utf8(NULL)))` for min and max values
### To Reproduce
Run the test…