-
### Is your feature request related to a problem or challenge?
When scanning Parquet files, we'd often like to provide an expected schema, since:
1. The Parquet files might not all have an identic…
-
Currently, hubverse-transform infers the parquet schema to apply when converted incoming model-output data to parquet. Because each file arrives and is transformed as a single unit, pyarrow has a limi…
-
![image](https://github.com/google/fhir-data-pipes/assets/92530372/af47dbea-883b-442b-a283-dbc37aca4cd3)
-
### Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the [latest version](https://pypi.org/project/polars/) of Polars.
### Reprodu…
-
`--recursive / -r`: If a `.zip` file inside of a `.zip` file is encountered, go inside of it and start adding files within it to the final parquet, instead of adding the `.zip` file iteslf. Should be …
-
### Modin version checks
- [x] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest released version of Modin.
- [X] I have confirmed t…
-
Native worker not writing Parquet data files for WriterVersion v1 (PARQUET_1_0)
## Your Environment
* Presto version used: 0.288-SNAPSHOT
* Storage (HDFS/S3/GCS..): S3
* Prestissimo Setup on L…
-
### Description
Cluster: 1 coordinator, 3 workers
Trino version: 441
Connector: iceberg
Hello! I'm running a query to create a new iceberg table from an existing iceberg table. Something like th…
-
### Describe the enhancement requested
`pyarrow.dataset.write_dataset(compression='lz4_raw')` currently fails with:
```
Traceback (most recent call last):
File "/work/projects/lisa/testpyarrow…
-
**Background**
We've previously identified non-uniform parquet schemas as a performance culprit when using `hubData` (because you have to do a `collect()` before you can filter). That issue is logged…