Closed spmp closed 1 year ago
Hi @spmp ,
The functionality is currently available. We don't have custom SQL API support because that depends on changes in Spark, but there are Scala APIs to do all of those. Documentation for time travel options is available at: https://docs.delta.io/latest/delta-batch.html#query-an-older-snapshot-of-a-table-time-travel
Documentation for the Scala APIs for DESCRIBE HISTORY is available at: https://docs.delta.io/latest/delta-utility.html#history
Thanks, and please let us know if you have further questions.
(sorry for late response, I received no email notification...) This feature request is for the SQL API such that these functions are available on a registered table rather load time option. @mukulmurthy is there an expected time for this functionality to exist in Spark. I assume there is much synergy (required) between Spark and delta-io.
You're right, there's definitely some interaction with Spark needed. It's hard to give a good estimate for this because it depends on the Spark 3.0 release (which can't really be scheduled, since it's a community effort with lots of different voices), but we're hoping for sometime this fall.
I'll reopen this issue (and add SQL to the title) so we can use it to track SQL APIs for these features.
Hi guys, I'm reading this and it seems that SQL support was integrated in delta.
I tried this simple code snippet and it seems to work
def showDeltaTableHistoryViaSQL() = {
spark.sql(s"DESCRIBE HISTORY '${mydata.tbAccountsPath}'").show(false)
}
Did I understand well? Can this issue be closed?
Yes, we did add support for simple SQL commands like DESCRIBE HISTORY
by providing a SparkSession extension that users can add to their session. What this extension does is inject additional grammar rules that can be used to parse a SQL command that Spark's default rules fail to parse. This was easy to add and works nicely for simple SQL commands that do not take expressions. But adding support for complex SQL commands (e.g. MERGE
) or adding new keywords inside existing commands (e.g. TIMESTAMP AS OF
inside SELECT
) is quite hard.
Since we have already added DESCRIBE HISTORY
but not others, let me update the title to reflect that.
Incidentally, support for MERGE SQL command will come with 0.7.0 release (next release is 0.6.0 in March) after Spark 3.0 is released
Nice! Thank you very much for your support and your answer!
Hi @tdas , when will this feature be available ? I'm looking for it. Thanks.
Thanks @brucemen711 for bumping this. I have looked again in 2021 at https://docs.delta.io/latest/delta-utility.html#history and seen that there is (AFAIK) still no SQL interface to TIMESTAMP AS OF
and VERSION AS OF
.
I was looking for this for the use case where consumers are connecting via a SQL interface such as Hive.
I agree that this feature depends on changes in Spark. It involves adding keywords, overriding some visit functions in AstBuilder, and other UnResolved LogicalPlans for time-travel. But, we need to this on Spark 2.X indeed. Shall we open a Spark issue to relate to this ? And i'll be pleasure to work on it. @tdas @mukulmurthy
@YannByron Feel free to raise a Spark issue to ask for the time travel SQL syntax support.
@YannByron Feel free to raise a Spark issue to ask for the time travel SQL syntax support.
I am looking forward to it, @YannByron please keep us in the loop for testing 8) Thanks
Since this currently doesn't work in SQL, can we remove the SQL examples from the public docs for now? https://docs.delta.io/1.0.0/delta-batch.html#syntax
@AFFogarty Good call. We will review the doc and update the examples.
Kia ora All, To clarify, is this now a Spark issue with the discussion moving to the Jira issue as above (https://issues.apache.org/jira/browse/SPARK-34978)
If so, should we call this out and close this issue?
The use case I really have in mind is that of a non technical user accessing a Delta table via Hive and hence I am looking for SQL interfaces to these commands such that the non technical user or downstream processes can access them. Is there an existing solution for accessing Delta and Timetravel with Hive or similar?
Since this currently doesn't work in SQL, can we remove the SQL examples from the public docs for now? https://docs.delta.io/1.0.0/delta-batch.html#syntax
+1 on this. It's very confusing to look at these docs and then wonder why it doesn't work in practice. :)
Great call out @nchammas - this should be fixed for the next documentation release. Will keep this open until this is resolved. Thanks!
I hit the same issue today, could you update the official document ? https://docs.delta.io/latest/delta-batch.html#query-an-older-snapshot-of-a-table-time-travel Here's another document error I found. https://github.com/delta-io/delta/issues/845
I hit the same issue today, could you update the official document ? https://docs.delta.io/latest/delta-batch.html#query-an-older-snapshot-of-a-table-time-travel
We are working on the fix for the doc. It will be out soon.
Closing this as support was added in #1288 and is being released in 2.1
Please add support for the time travel functions
TIMESTAMP AS OF
,VERSION AS OF
~, andDESCRIBE HISTORY
~.Time travel is a critical Delta use.
Cheers.
-- Updated by @zsxwing : Spark has added the SQL syntax support in https://issues.apache.org/jira/browse/SPARK-37219 . This will be supported when Delta supports Spark 3.3 (#1217).