delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
6.98k stars 1.6k forks source link

Support `TIMESTAMP AS OF`, `VERSION AS OF` in SQL #128

Closed spmp closed 1 year ago

spmp commented 4 years ago

Please add support for the time travel functions TIMESTAMP AS OF, VERSION AS OF~, and DESCRIBE HISTORY~.

Time travel is a critical Delta use.

Cheers.

-- Updated by @zsxwing : Spark has added the SQL syntax support in https://issues.apache.org/jira/browse/SPARK-37219 . This will be supported when Delta supports Spark 3.3 (#1217).

mukulmurthy commented 4 years ago

Hi @spmp ,

The functionality is currently available. We don't have custom SQL API support because that depends on changes in Spark, but there are Scala APIs to do all of those. Documentation for time travel options is available at: https://docs.delta.io/latest/delta-batch.html#query-an-older-snapshot-of-a-table-time-travel

Documentation for the Scala APIs for DESCRIBE HISTORY is available at: https://docs.delta.io/latest/delta-utility.html#history

Thanks, and please let us know if you have further questions.

spmp commented 4 years ago

(sorry for late response, I received no email notification...) This feature request is for the SQL API such that these functions are available on a registered table rather load time option. @mukulmurthy is there an expected time for this functionality to exist in Spark. I assume there is much synergy (required) between Spark and delta-io.

mukulmurthy commented 4 years ago

You're right, there's definitely some interaction with Spark needed. It's hard to give a good estimate for this because it depends on the Spark 3.0 release (which can't really be scheduled, since it's a community effort with lots of different voices), but we're hoping for sometime this fall.

I'll reopen this issue (and add SQL to the title) so we can use it to track SQL APIs for these features.

holydrinker commented 4 years ago

Hi guys, I'm reading this and it seems that SQL support was integrated in delta.

I tried this simple code snippet and it seems to work

def showDeltaTableHistoryViaSQL() = {
  spark.sql(s"DESCRIBE HISTORY '${mydata.tbAccountsPath}'").show(false)
}

Did I understand well? Can this issue be closed?

tdas commented 4 years ago

Yes, we did add support for simple SQL commands like DESCRIBE HISTORY by providing a SparkSession extension that users can add to their session. What this extension does is inject additional grammar rules that can be used to parse a SQL command that Spark's default rules fail to parse. This was easy to add and works nicely for simple SQL commands that do not take expressions. But adding support for complex SQL commands (e.g. MERGE) or adding new keywords inside existing commands (e.g. TIMESTAMP AS OF inside SELECT) is quite hard.

Since we have already added DESCRIBE HISTORY but not others, let me update the title to reflect that.

Incidentally, support for MERGE SQL command will come with 0.7.0 release (next release is 0.6.0 in March) after Spark 3.0 is released

holydrinker commented 4 years ago

Nice! Thank you very much for your support and your answer!

brucemen711 commented 3 years ago

Hi @tdas , when will this feature be available ? I'm looking for it. Thanks.

spmp commented 3 years ago

Thanks @brucemen711 for bumping this. I have looked again in 2021 at https://docs.delta.io/latest/delta-utility.html#history and seen that there is (AFAIK) still no SQL interface to TIMESTAMP AS OF and VERSION AS OF. I was looking for this for the use case where consumers are connecting via a SQL interface such as Hive.

YannByron commented 3 years ago

I agree that this feature depends on changes in Spark. It involves adding keywords, overriding some visit functions in AstBuilder, and other UnResolved LogicalPlans for time-travel. But, we need to this on Spark 2.X indeed. Shall we open a Spark issue to relate to this ? And i'll be pleasure to work on it. @tdas @mukulmurthy

zsxwing commented 3 years ago

@YannByron Feel free to raise a Spark issue to ask for the time travel SQL syntax support.

YannByron commented 3 years ago

@YannByron Feel free to raise a Spark issue to ask for the time travel SQL syntax support.

https://issues.apache.org/jira/browse/SPARK-34978

spmp commented 3 years ago

I am looking forward to it, @YannByron please keep us in the loop for testing 8) Thanks

AFFogarty commented 2 years ago

Since this currently doesn't work in SQL, can we remove the SQL examples from the public docs for now? https://docs.delta.io/1.0.0/delta-batch.html#syntax

zsxwing commented 2 years ago

@AFFogarty Good call. We will review the doc and update the examples.

spmp commented 2 years ago

Kia ora All, To clarify, is this now a Spark issue with the discussion moving to the Jira issue as above (https://issues.apache.org/jira/browse/SPARK-34978)

If so, should we call this out and close this issue?

The use case I really have in mind is that of a non technical user accessing a Delta table via Hive and hence I am looking for SQL interfaces to these commands such that the non technical user or downstream processes can access them. Is there an existing solution for accessing Delta and Timetravel with Hive or similar?

nchammas commented 2 years ago

Since this currently doesn't work in SQL, can we remove the SQL examples from the public docs for now? https://docs.delta.io/1.0.0/delta-batch.html#syntax

+1 on this. It's very confusing to look at these docs and then wonder why it doesn't work in practice. :)

dennyglee commented 2 years ago

Great call out @nchammas - this should be fixed for the next documentation release. Will keep this open until this is resolved. Thanks!

Tagar commented 2 years ago

FYI https://github.com/apache/spark/pull/34497

zjffdu commented 2 years ago

I hit the same issue today, could you update the official document ? https://docs.delta.io/latest/delta-batch.html#query-an-older-snapshot-of-a-table-time-travel Here's another document error I found. https://github.com/delta-io/delta/issues/845

zsxwing commented 2 years ago

I hit the same issue today, could you update the official document ? https://docs.delta.io/latest/delta-batch.html#query-an-older-snapshot-of-a-table-time-travel

We are working on the fix for the doc. It will be out soon.

allisonport-db commented 1 year ago

Closing this as support was added in #1288 and is being released in 2.1