delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[BUG] Missing spec for delta_log/00000000000000000000.crc files #1664

Open felipepessoto opened 1 year ago

felipepessoto commented 1 year ago

Bug

Describe the problem

The PROTOCOL doesn't describe the CRC files in delta_log, is it intentional?

Example: 00000000000000000000.crc

{"tableSizeBytes":1234567,"numFiles":123,"numMetadata":1,"numProtocol":1,"protocol":{"minReaderVersion":1,"minWriterVersion":2},"metadata....

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

scottsand-db commented 1 year ago

Thanks for making this issue! I believe that @vkorukanti will follow up!

felipepessoto commented 8 months ago

Hi @vkorukanti, do you have any comments on this?

vkorukanti commented 8 months ago

@felipepessoto Where did you see this crc file? cc. @prakharjain09.

felipepessoto commented 8 months ago

Currently it is created by Databricks only.

felipepessoto commented 1 month ago

@prakharjain09 any insights?

prakharjain09 commented 4 weeks ago

Thanks for bringing this up @felipepessoto . We are working on adding support for checksum file in the Delta spec and implementation.