-
`TransCompressionCommand` in parquet-tools is intended to allow translation of compression types in parquet files. We are intending to use this functionality to debug a corrupted file, but this comma…
-
Snowflake tables with date and timestamp columns (with timezone or not), when synced back to Dataiku via `sync_snowflake_to_hdfs`, are imported as `int` and `bigint`, respectively. This appears to be …
-
This might sound crazy but still I wanted to propose a feature request about parquet files.
You might ask, why? Parquet files are becoming more widespread and might even be considered as "the new …
-
Current format :
* Numpy matrix + parquet for IDs (ordered collections)
* parquet with embeddings+id
Numpy+parquet :
Benefit:
* Fast to read numpy alone
* fast to read parquet alone
Drawbac…
-
### Bug description
Expected behavior:
able to read the parquet file with array type that contains 30000 empty arrays. Both parquet-tools and presto parquet reader are able to read the file
```
…
-
Hi
I'm trying to write Avro message to parquet on GCS. These parquet should be query by big query engine who support now parquet.
To do this I'm using Secor a kafka log persister tools from pinter…
-
### Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the [latest version](https://pypi.org/project/polars/) of Polars.
### Re…
-
I'm attempting to aggregate records by id as they are processed from SQS via lambda into S3.
I do get a merged file uploaded into S3 as I can see the filesize increasing each time the lambda runs, …
-
Would it make sense to be able to introduce support for `avro` schema for `TypedDataSet`?
The current code defines schema based on the `SparkSQL` "language": https://github.com/typelevel/frameless…
-
### Problem Statement
CrateDB's current export functionality is limited to JSON format, which results in the loss of type information and suboptimal data handling. The COPY TO command lacks support…