-
### Proposed Change
There is a need to perform exploratory aggregation queries on tables with `min/max` aggregations on data columns. Currently spec for `data_file` struct specifies the following mea…
-
## What went wrong?
Both `IdentityToZeroTransformation` and `NullToZeroTransformation` are to handle special instances where `LinearTransformer` is used to map `Numeric` columns, but the values are e…
-
### Apache Iceberg version
1.4.3
### Query engine
Spark
### Please describe the bug 🐞
We have added to columns in a nested struct field by using Iceberg Java API. I can query and see th…
-
First of All, thanks for your continuous support.
i tested the new release and it works as expected. But i am missing the time stamp in which the vertipaq data has been exported to the lh tables so…
-
## Feature request
#### Which Delta project/connector is this regarding?
- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)
### Overview
Delta allows spec…
-
Apache Spark is widely used in the python ecosystem for distributed computing. As user of spark I would like for ruff to lint problematic behaviours. The automation that ruff offers is especially usef…
-
**Describe the problem you faced**
hello i try to test several schema evolution usecases using hudi 0.15 and spark3.5 using hms 4
first test: Adding column in PG --> debezium / schema registry ok --…
-
I am trying to connect to a databricks cluster and trying to run the exploratory command to list databases with `src_databases(sc)`. Not sure but wanted to reach out for thoughts on what could be goin…
-
#### Please describe your question here
I'm using spark operator in minikube + minio to send some SQL distributed queries over CSV 2.4GB files with 8883 lines with 20000 columns each one and recove…
-
**Description**
I have two PySpark dataframes, source_df and target_df. I ran `pip install pyspark-extension` to install diff.
Spark Version - 3.4.1
Scala Version - 2.12
When I run `source_…