-
**Summary**
currently, when reading parquet file, the fields of file schema is modified that all field names are turned to lowercase.
# Solution 1
parquet/ndjson add format option case_s…
-
I've tried using sail for local development of spark jobs. But running simple query on dataset that has size of few GBs makes sail slower than spark.
When join is not there then query runs within 10…
-
https://duckdb.org/2024/11/14/optimizers.html#filter-pull-up--filter-pushdown has a nice description of filter pull up, an optimization in DuckDB that I'd like to implement in dask-expr as a learning …
-
This issue describes a previously undetected error that arrives with the author's [updating](https://github.com/CDCgov/forecasttools-py/pull/27/files#diff-81eac04473fd8ae59afb7b15af99d1086bd59a1e76676…
-
**Is your feature request related to a problem? Please describe.**
In cudf-polars, predicate pushdown can result in arbitrary expressions being part of the parquet read phase. Not all of these expres…
-
Using the `feature/parquet` branch, I get really high memory usage from running this configuration:
```
substreams-sink-files run \
eos.substreams.pinax.network:443 \
https://github.com/pinax-netw…
-
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
Utf8 validation comes up in profiles when reading Parquet.
**Describe the solution you'd…
-
### Mountpoint for Amazon S3 version
1.10.0
### AWS Region
n/a
### Describe the running environment
Running on local S3 (Vast Data)
### Mountpoint options
```shell
mount-s3 \
--log-directo…
-
### Description
The command line DataFusion compaction utility (not used by operational Sleeper) fails when the input Parquet files are specified using a relative path such as `../../some_file/test…
-
Hi
The Java Iceberg implementation is adding support for using native Parquet modular encryption, which is being developed as part of version 3 of the Iceberg specification: https://github.com/orgs…