delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

Commit timestamp is retrieved from file timestamp instead of delta logfile, breaking functionality when log files are moved and/or restored #3690

Open samvruggink opened 2 months ago

samvruggink commented 2 months ago

https://github.com/delta-io/delta/blob/93eef1112fce9c766aec504f26f09d53bbcabb03/connectors/standalone/src/main/scala/io/delta/standalone/internal/DeltaHistoryManager.scala#L202

Since the timestamp is retrieved from the file timestamp, when doing a full restore (moving files) or moving the _delta_log contents. The timestamps will all share the same value, breaking the functionality of restoring to a point in time.

Could this be changed to the commit timestamp that is available in the actual log file?

felipepessoto commented 1 month ago

Hi @samvruggink. Are you talking about delta standalone specifically?

In Spark Delta, the In-Commit Timestamps may help you, it is in preview.

https://github.com/delta-io/delta/pull/2596/files#diff-98df89d4b0ce76abb5263d4fbb6f991660083a254dcdb9d922a4973a9937a4e2