How we can create hive table on delta lake to access latest snapshot

sanjiv1980 commented 5 years ago

Can we able to create Hive table on delta to access updated snapshot . ?
We know whenever the update/delete run on delta lake (by delta lake APIs) it re-process entire dataset, and update the deltaLog, so that we can get the updated snapshot. So when we do the same operation (ACID) by using Hive , it should also do the same what delta lake API does. ?

tdas commented 5 years ago

When you say "Hive table on delta to access snapshot", do you mean "read Hive Metastore table using Spark", or "read the table using Hive"?

If you mean "read Hive metastore table using spark", then the answer is oss Delta does not support metastore tables yet because Apache Spark 2.x lacks the interfaces needed to make it work. Apache Spark 3.0 with DataSourceV2 will have all the necessary pluggable interfaces to make Delta work with Hive Metastore tables. We are actively working with the spark community to make this work.
If you mean "read the table using Hive", that is also a work in progress - https://github.com/delta-io/delta/pull/111

I am not familiar with the internal details of Hive ACID works, so can't really compare. All I can say is Delta will use Apache Spark's data skipping capabilities to do an optimized scan on the table to find which files need to be rewritten (i.e. matches update/delete conditions) and only rewrite those files.

tdas commented 5 years ago

Hopefully, I answered your question. I am going to close this issue. Please reopen it if you have any further questions.

delta-io / delta