agile-lab-dev / whitefox

Lake Sharing is an open protocol, heavily influenced by Delta Sharing, for secure exchange of large datasets, which enables organizations to share data regardless of which computing platforms and table/file format they use. Forked from Delta Sharing
https://agile-lab-dev.github.io/whitefox/
Apache License 2.0
12 stars 0 forks source link

fix(deps): update dependency io.delta:delta-standalone_2.13 to v3.2.0 #282

Closed renovate[bot] closed 1 month ago

renovate[bot] commented 1 month ago

Mend Renovate

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
io.delta:delta-standalone_2.13 (source) 3.1.0 -> 3.2.0 age adoption passing confidence

Release Notes

delta-io/delta (io.delta:delta-standalone_2.13) ### [`v3.2.0`](https://togithub.com/delta-io/delta/releases/tag/v3.2.0): Delta Lake 3.2.0 We are excited to announce the release of Delta Lake 3.2.0! This release includes several exciting new features. #### Highlights - [Support for Liquid clustering](https://togithub.com/delta-io/delta/commit/4456a122929b834e5c2652f99cc64ff8a71f4113) to reduce write amplification using incremental clustering. - Preview [support for Type Widening](https://togithub.com/delta-io/delta/commit/9b3fa0a1a05e51b38cec083afb41226beb399b0f) to allow users to change the type of columns without having to rewrite data. - Preview [support](https://togithub.com/delta-io/delta/commit/902830369662f5a84e987b3a97e23f916da104ca) for [Apache Hudi](https://hudi.apache.org/) in Delta UniForm tables. #### Delta Spark Delta Spark 3.2.0 is built on [Apache Spark™ 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13. - Documentation: - API documentation: - Maven artifacts: [delta-spark\_2.12](https://repo1.maven.org/maven2/io/delta/delta-spark\_2.12/3.2.0/), [delta-spark\_2.13](https://repo1.maven.org/maven2/io/delta/delta-spark\_2.13/3.2.0/), [delta-contribs\_2.12](https://repo1.maven.org/maven2/io/delta/delta-contribs\_2.12/3.2.0/), [delta_contribs\_2.13](https://repo1.maven.org/maven2/io/delta/delta-contribs\_2.13/3.2.0/), [delta-storage](https://repo1.maven.org/maven2/io/delta/delta-storage/3.2.0/), [delta-storage-s3-dynamodb](https://repo1.maven.org/maven2/io/delta/delta-storage-s3-dynamodb/3.2.0/), [delta-iceberg\_2.12](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.12/3.2.0/), [delta-iceberg\_2.13](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.13/3.2.0/) - Python artifacts: https://pypi.org/project/delta-spark/3.2.0/ The key features of this release are: - [Support for Liquid clustering](https://togithub.com/delta-io/delta/issues/1874): This allows for [incremental clustering](https://togithub.com/delta-io/delta/commit/4456a122929b834e5c2652f99cc64ff8a71f4113) based on ZCubes and reduces the write amplification by not touching files already well clustered (i.e., files in stable ZCubes). Users can now use the [ALTER TABLE CLUSTER BY](https://togithub.com/delta-io/delta/commit/6f4e05197) syntax to change clustering columns and use the DESCRIBE DETAIL command to check the clustering columns. In addition, Delta Spark now supports DeltaTable `clusterBy` API in both Python and Scala to allow creating clustered tables using DeltaTable API. See the [documentation](https://docs.delta.io/3.2.0/delta-clustering.html) and [examples](https://togithub.com/delta-io/delta/blob/branch-3.2/examples/scala/src/main/scala/example/Clustering.scala) for more information. - Preview [support for Type Widening](https://togithub.com/delta-io/delta/commit/9b3fa0a1a05e51b38cec083afb41226beb399b0f): Delta Spark can now change the type of a column from `byte` to `short` to `integer` using the [ALTER TABLE t CHANGE COLUMN col TYPE type](https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-alter-table.html#alter-or-change-column) command or with schema evolution during MERGE and INSERT operations. The table remains readable by Delta 3.2 readers without requiring the data to be rewritten. For compatibility with older versions, a rewrite of the data can be triggered using the `ALTER TABLE t DROP FEATURE 'typeWidening-preview’` command. - Note that this feature is in preview and that tables created with this preview feature enabled may not be compatible with future Delta Spark releases. - [Support for Vacuum Inventory](https://togithub.com/delta-io/delta/commit/7d41fb7bbf63af33ad228007dd6ba3800b4efe81): Delta Spark now extends the VACUUM SQL command to allow users to specify an inventory table in a VACUUM command. When an inventory table is provided, VACUUM will consider the files listed there instead of doing the full listing of the table directory, which can be time consuming for very large tables. See the docs [here](https://docs.delta.io/3.2.0/delta-utility.html#inventory-table). - [Support for Vacuum Writer Protocol Check](https://togithub.com/delta-io/delta/commit/2e197f130765d91f201b6b649f30190a44304b29): Delta Spark can now  support `vacuumProtocolCheck` ReaderWriter feature which ensures consistent application of reader and writer protocol checks during `VACUUM` operations, addressing potential protocol discrepancies and mitigating the risk of data corruption due to skipped writer checks. - Preview [support for In-Commit Timestamps](https://togithub.com/delta-io/delta/commit/b15a2c97432c8892f986c1526ceb2c3f63ed5d2c): When enabled, this [preview feature](https://togithub.com/delta-io/delta/issues/2532) persists monotonically increasing timestamps within Delta commits, ensuring they are not affected by file operations. When enabled, time travel queries will yield consistent results, even if the table directory is relocated. - Note that this feature is in preview and that tables created with this preview feature enabled may not be compatible with future Delta Spark releases. - Deletion Vectors Read Performance Improvements: Two improvements were introduced to DVs in Delta 3.2. - [Removing broadcasting of DV information to executors](https://togithub.com/delta-io/delta/commit/be7183bef85feaebfc928d5f291c5a90246cde87): This work improves stability by reducing drivers’ memory consumption, preventing potential Driver OOM for very large Delta tables like 1TB+. This work also improves performance by saving us fixed broadcasting overhead in reading small Delta Tables. - [Supporting predicate pushdown and splitting in scans with DVs](https://togithub.com/delta-io/delta/pull/2982): Improving performance of DV reads with filters queries thanks to predicate pushdown and splitting. This feature gains 2x performance improvement on average. - [Support for Row Tracking](https://togithub.com/delta-io/delta/commit/23b7c17628c21881fbefd04db11a31c973205d95): Delta Spark can now write to tables that maintain information that allows identifying rows across multiple versions of a Delta table. Delta Spark can now also access this tracking information using the two metadata fields `_metadata.row_id` and `_metadata.row_commit_version`. Other notable changes include: - [Delta Sharing](https://togithub.com/delta-io/delta/commit/8b4b6cce7071046da3d6d3fda4b85120a7445771): reduce the minimum RPC interval in delta sharing streaming from 30 seconds to 10 seconds - [Improve](https://togithub.com/delta-io/delta/commit/bba0e94f0) the performance of write operations by skipping collecting commit stats - [New SQL configurations](https://togithub.com/delta-io/delta/commit/3f0496ba3) to specify Delta Log cache size (`spark.databricks.delta.delta.log.cacheSize`) and retention duration (`spark.databricks.delta.delta.log.cacheRetentionMinutes`) - [Fix](https://togithub.com/delta-io/delta/commit/8db9617b5) bug in plan validation due to inconsistent field metadata in MERGE - [Improved](https://togithub.com/delta-io/delta/commit/ef751d236) metrics during VACUUM for better visibility - Hive Metastore schema sync: The truncation threshold for schemas with long fields is now [user configurable](https://togithub.com/delta-io/delta/commit/3c09d95a34b71fff20cb23753c65af95da5cb48f) #### Delta Universal Format (UniForm) - Documentation: - Maven artifacts: [delta-iceberg\_2.12](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.12/3.2.0/), [delta-iceberg\_2.13](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.13/3.2.0/), [delta-hudi\_2.12](https://repo1.maven.org/maven2/io/delta/delta-hudi\_2.12/3.2.0/), [delta-hudi\_2.13](https://repo1.maven.org/maven2/io/delta/delta-hudi\_2.13/3.2.0/) Hudi is now [supported](https://togithub.com/delta-io/delta/commit/902830369662f5a84e987b3a97e23f916da104ca) by Delta Universal format in addition to Iceberg. Writing to a Delta UniForm table can generate Hudi metadata, alongside Delta. This feature is contributed by XTable. Create a UniForm-enabled that automatically generates Hudi metadata using the following command: ```sql CREATE TABLE T (c1 INT) USING DELTA TBLPROPERTIES ('delta.universalFormat.enabledFormats' = hudi); ``` See the documentation [here](https://docs.delta.io/3.2.0/delta-uniform.html) for more details. Other notable changes include: - [Throw](https://togithub.com/delta-io/delta/commit/726165608) a better error if Iceberg conversion fails during initial sync - [Fix](https://togithub.com/delta-io/delta/commit/79a0581bd) a bug in Delta Universal Format to support correct table overwrites #### Delta Kernel - API documentation: - Maven artifacts: [delta-kernel-api](https://repo1.maven.org/maven2/io/delta/delta-kernel-api/3.2.0/), [delta-kernel-defaults](https://repo1.maven.org/maven2/io/delta/delta-kernel-defaults/3.2.0/) The Delta Kernel project is a set of Java libraries ([Rust](https://togithub.com/delta-incubator/delta-kernel-rs) will be coming soon!) for building Delta connectors that can read (and, soon, write to) Delta tables without the need to understand the [Delta protocol details](https://togithub.com/delta-io/delta/blob/master/PROTOCOL.md)). In this release,e we improved the read support to make it production-ready by adding numerous performance improvements, additional functionality, and improved protocol support. - Support for time travel. Now you can read a table snapshot at a [version id](https://docs.delta.io/3.2.0/api/java/kernel/io/delta/kernel/Table.html#getSnapshotAsOfVersion-io.delta.kernel.engine.Engine-long-) or snapshot at a [timestamp](https://docs.delta.io/3.2.0/api/java/kernel/io/delta/kernel/Table.html#getSnapshotAsOfTimestamp-io.delta.kernel.engine.Engine-long-). - Improved Delta protocol support. - [Support](https://togithub.com/delta-io/delta/pull/2826) for reading tables with [`checkpoint v2`](https://togithub.com/delta-io/delta/blob/master/PROTOCOL.md#v2-checkpoint-table-feature). - Support for reading tables with `timestamp` partition type data column. - [Support](https://togithub.com/delta-io/delta/pull/2855) for reading tables with column data type [`timestamp_ntz`](https://togithub.com/delta-io/delta/blob/master/PROTOCOL.md#timestamp-without-timezone-timestampntz). - Improved table metadata read performance and reliability on very large tables with millions of files - Improved [checkpoint reading latency](https://togithub.com/delta-io/delta/pull/2872) by pushing the partition predicate to the checkpoint Parquet reader to minimize reading number of checkpoint files read. - Improved state reconstruction latency by [using](https://togithub.com/delta-io/delta/pull/2770) `LogStore`s from [`delta-storage`](https://togithub.com/delta-io/delta/blob/master/storage/src/main/java/io/delta/storage/LogStore.java) module for faster `listFrom` calls.  - [Retry](https://togithub.com/delta-io/delta/pull/2812) loading the `_last_checkpoint` checkpoint in case of transient failures. Loading the last checkpoint info from this file helps construct the Delta table state faster. - [Optimization](https://togithub.com/delta-io/delta/pull/2817) to minimize the number of listing calls to object store when trying to find a last checkpoint at or before a version. - Other notable changes include: - [Support](https://togithub.com/delta-io/delta/pull/2651) for `IS_NULL` expression. Now the `Predicate` passed to Kernel [`ScanBuilder`](https://docs.delta.io/3.2.0/api/java/kernel/io/delta/kernel/ScanBuilder.html#withFilter-io.delta.kernel.engine.Engine-io.delta.kernel.expressions.Predicate-) can include `IS_NULL` predicates. - [Support](https://togithub.com/delta-io/delta/pull/2701) for custom `ParquetHandler` implementations to multiple Parquet files in parallel. The current default implementation reads one file at a time, but the connectors can implement their own custom `ParquetHandler` to read the Parquet files in parallel. In this release we also added **preview** version of APIs that allows connectors to: - Create tables - Insert data into tables. Current support is just for blind appends only. - Insert data using idempotent writes. The above functionality is available both for the partitioned and unpartitioned tables. Refer to the [examples](https://togithub.com/delta-io/delta/tree/branch-3.2/kernel/examples/kernel-examples/src/main/java/io/delta/kernel/examples) for sample connector code to create and blind append data to the tables. We are still developing and evolving these APIs. Please give it a try and provide us feedback. For more information, refer to: - [User guide](https://togithub.com/delta-io/delta/blob/branch-3.2/kernel/USER_GUIDE.md) on step-by-step process of using Kernel in a standalone Java program or in a distributed processing connector. - [Slides](https://docs.google.com/presentation/d/1PGSSuJ8ndghucSF9GpYgCi9oeRpWolFyehjQbPh92-U/edit) explaining the rationale behind Kernel and the API design. - Example [Java programs](https://togithub.com/delta-io/delta/tree/branch-3.2/kernel/examples/table-reader/src/main/java/io/delta/kernel/examples) that illustrate how to read Delta tables using the Kernel APIs. - Table and default Engine API Java [documentation](https://docs.delta.io/3.2.0/api/java/kernel/index.html) - [Migration guide](https://togithub.com/delta-io/delta/blob/master/kernel/USER_GUIDE.md#migration-from-delta-lake-version-310-to-320) to upgrade your connector to use the 3.2.0 APIs #### Credits Adam Binford, Ala Luszczak, Allison Portis, Ami Oka, Andreas Chatzistergiou, Arun Ravi M V, Babatunde Micheal Okutubo, Bo Gao, Carmen Kwan, Chirag Singh, Chloe Xia, Christos Stavrakakis, Costas Zarifis, Daniel Tenedorio, Davin Tjong, Dhruv Arya, Felipe Pessoto, Fred Storage Liu, Fredrik Klauss, Gabriel Russo, Hao Jiang, Hyukjin Kwon, Ian Streeter, Jason Teoh, Jiaheng Tang, Jing Zhan, Jintian Liang, Johan Lasperas, Jonas Irgens Kylling, Juliusz Sompolski, Kaiqi Jin, Lars Kroll, Lin Zhou, Miles Cole, Nick Lanham, Ole Sasse, Paddy Xu, Prakhar Jain, Rachel Bushrian, Rajesh Parangi, Renan Tomazoni Pinzon, Sabir Akhadov, Scott Sandre, Simon Dahlbacka, Sumeet Varma, Tai Le, Tathagata Das, Thang Long Vu, Tim Brown, Tom van Bussel, Venki Korukanti, Wei Luo, Wenchen Fan, Xupeng Li, Yousof Hosny, Gene Pang, Jintao Shen, Kam Cheung Ting, panbingkun, ram-seek, Sabir Akhadov, sokolat, tangjiafu

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.



This PR has been generated by Mend Renovate. View repository job log here.