-
### What happens?
After copying a Parquet file into a MySQL table, the Parquet file seems to be locked and therefore cannot be deleted until the process is killed. We use tmp Parquet files to bridge…
-
### Describe the enhancement requested
FAPEC is a high-performance data compression algorithm with many options, based on efficient entropy coding and including several pre-processing algorithms for …
-
For votable.parquet we have `column_metadata` while for parquet.votable we have `metadata`.
Now, I have kept this inconsistency in #16375 as we have already run into the issue that the metadata we …
-
I have a ~1.5TiB, ~1.7k files parquet dataset with an additional `_metadata.parquet` file containing metadata of all row groups. The `_metadata` file was written with the mechanism described in the [d…
-
**Summary**
currently, when reading parquet file, the fields of file schema is modified that all field names are turned to lowercase.
# Solution 1
parquet/ndjson add format option case_s…
-
### Search before asking
- [X] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar.
### Paimon version
0.9
### Compute Engine
Spark
### Minimal reprodu…
-
I've tried using sail for local development of spark jobs. But running simple query on dataset that has size of few GBs makes sail slower than spark.
When join is not there then query runs within 10…
-
### Describe the bug
Daft doesnt support some feature in the parquet file format for boolean columns.
### To Reproduce
```
import polars as pl
import daft
df = pl.DataFrame(
{"a": [1, 2, …
-
https://duckdb.org/2024/11/14/optimizers.html#filter-pull-up--filter-pushdown has a nice description of filter pull up, an optimization in DuckDB that I'd like to implement in dask-expr as a learning …
-
Using the `feature/parquet` branch, I get really high memory usage from running this configuration:
```
substreams-sink-files run \
eos.substreams.pinax.network:443 \
https://github.com/pinax-netw…