-
In the current implementation, the `__call__` method of `nemo_curator/modules/fuzzy_dedup.py`, it assumes that at least one LSH duplicate will be found, and the results will be saved as a parquet file…
-
### Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the [latest version](https://pypi.org/project/polars/) of Polars.
### Reprodu…
-
### What happens?
I am using httpfs to query parquet files from Hugging Face.
Hugging Face initially returns a 302 redirect for the download file which DuckDB tries to follow. However DuckDB URL d…
-
File "Show-o/parquet/refinedweb_dataset.py", line 20, in
from parquet.parquet_dataset import CruiseParquetDataset
ModuleNotFoundError: No module named 'parquet.parquet_dataset'
-
When I'm trying to copy data from local Parquet file to the database I get the following error:
```
type mismatch for column "bla" between table and parquet file. table expected "Utf8" but file h…
-
### Backend
CH (ClickHouse)
### Bug description
``` scala
test(
"GLUTEN-8021/8022: fix orc read/write mismatch and parquet" +
"read exception when written complex column contains…
-
Would a PR to add `'application/vnd.apache.parquet'` here be welcome?
https://github.com/falconry/falcon/blob/b29fd5540ae58bed47198ea447f1e9194c34155c/falcon/constants.py#L137-L145
-
### Description
The ParquetReader initializeSchema needs a refactor. There are too many branches and special cases.
https://github.com/facebookincubator/velox/blob/main/velox/dwio/parquet/reader/Par…
-
If the bug is related to a specific library below, please raise an issue in the
respective repo directly:
[TensorFlow Data Validation Repo](https://github.com/tensorflow/data-validation/issues)
…
-
Any love for Parquet files as source or destination?