-
### Description
pl.scan_parquet() does not seem to allow passing custom HTTP headers when the path is an HTTP URL. This can be a problem if the HTTP server requires an auth token.
It would be grea…
-
### Describe the bug
Regarding https://github.com/apache/datafusion-comet/pull/537, there are 103 Spark 4.0 sql tests failing.
- sql1 91 tests failing
- sql2 12 tests failing
Fix comet shims…
-
### Component(s)
exporter/file
### Is your feature request related to a problem? Please describe.
Parquet Format:
Parquet is a columnar storage file format optimized for big data processing framew…
-
https://observablehq.com/@fil/insee-parquet
This is a great showcase and argument for advocating to re-host the origin-destination data as parquet.
Meanwhile, I tested a few data repositories, s…
-
**Affected module**
Ingestion Framework
**Describe the bug**
Failed to run S3 storage metadata ingestion due _SUCCESS file in dataPath entries folder.
**To Reproduce**
openmetadata.json
```…
-
- [ ] ~Find AnVIL snapshots in TDR `dev` where `.accessInformation.parquet` is not `null` in the TDR retrieveSnapshot response (none at the moment)~
- [x] Alternatively, but note MA concerns: Find su…
-
**Describe the bug**
ScanTask memory resources are set to the compressed size instead of the uncompressed byte size when working with row groups:
https://github.com/Eventual-Inc/Daft/blob/395ebe8f40…
-
Currently, hubverse-transform infers the parquet schema to apply when converted incoming model-output data to parquet. Because each file arrives and is transformed as a single unit, pyarrow has a limi…
-
### Bug description
Using the test method provided by @qqibrow https://github.com/facebookincubator/velox/issues/7478, four compression formats(GZIP, SNAPPY, LZO and UNCOMPRESSED) and two parquet …
-
### Describe the enhancement requested
In https://github.com/apache/parquet-format/pull/240 there is concern regarding the ability to add a new logical type (in this case GEOMETRY) in a backwards com…