delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[Spark] Write a checksum after every commit #3799

Closed dhruvarya-db closed 3 weeks ago

dhruvarya-db commented 4 weeks ago

Which Delta project/connector is this regarding?

Description

This PR adds a ChecksumHook which is responsible for a writing a checksum (See https://github.com/delta-io/delta/pull/3777) of the current table state after every commit. This is guarded behind a flag which is false by default. Currently, every checksum write will trigger a full state reconstruction, which can be very expensive. An upcoming PR will try to make this checksum computation incremental so that we don't have to pay a performance penalty.

How was this patch tested?

Added a new suite --- ChecksumSuite.

Does this PR introduce any user-facing changes?

No

felipepessoto commented 4 weeks ago

Could you share an example of CRC file? I tried to run locally but it throws an error:

  org.apache.spark.sql.delta.DeltaAnalysisException: [DELTA_CONFIGURE_SPARK_SESSION_WITH_EXTENSION_AND_CATALOG] This Delta operation requires the SparkSession to be configured with the DeltaSparkSessionExtension and the DeltaCatalog. Please set the necessary configurations when creating the SparkSession as shown below.
dhruvarya-db commented 3 weeks ago

Could you share an example of CRC file? I tried to run locally but it throws an error:

org.apache.spark.sql.delta.DeltaAnalysisException: [DELTA_CONFIGURE_SPARK_SESSION_WITH_EXTENSION_AND_CATALOG] This Delta operation requires the SparkSession to be configured with the DeltaSparkSessionExtension and the DeltaCatalog. Please set the necessary configurations when creating the SparkSession as shown below.

Hey @felipepessoto , I will try to generate one locally and post here.

felipepessoto commented 3 weeks ago

Don't worry, I already did it locally after the UT fix