-
Notes on how to use the `parquet-tools` from Hadoop to inspect Parquet files.
https://stackoverflow.com/questions/36140264/inspect-parquet-from-command-line
I suspect we will end up using these a…
-
`presto-orc` has very few deprecation warnings and these are probably unavoidable (related to #37)
however `presto-parquet` has many more. We should resolve them, as they often signal upcoming brea…
-
Would you like to merge this into main parquet project as a separate package? Someone already asked for this.
-
This issue is to track the progress of adding Parquet ExampleGen support, so that it could be integrated into TFX OSS.
Related issues: https://github.com/tensorflow/tfx/issues/74
Related discuss…
-
### Feature Type
- [X] Adding new functionality to pandas
- [ ] Changing existing functionality in pandas
- [ ] Removing existing functionality in pandas
### Problem Description
Currently the .t…
-
### Apache Iceberg version
1.2.1
### Query engine
Spark
### Please describe the bug 🐞
I'm using micro batch spark streaming read parquet file and writing to iceberg table. When writing …
-
**Describe the bug**
queries such as `read_parquet().count_rows()` should not do a full scan, and instead be should able to be fulfilled by the metadata only.
The need for the full scan should be …
-
We ingest avro data from kafka produced by debezium.
Debezium dropped column for an optional column - it was backward compatible change according to schema registry.
This tool had not adjusted ice…
-
timestamp with timezone (per SQL)
timestamps are adjusted to UTC and stored as integers.
metadata in logical types PR:
See discussion here: https://github.com/apache/parquet-format/pull/51#discussion_…
-
In the latest v2.0 release of parquet-mr (issue [PARQUET-1822](https://issues.apache.org/jira/browse/PARQUET-1822)), they have added a number of wrapper classes which should allow users to use parquet…