delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.45k stars 1.67k forks source link

How we can create hive table on delta lake to access latest snapshot #229

Closed sanjiv1980 closed 4 years ago

sanjiv1980 commented 4 years ago
  1. Can we able to create Hive table on delta to access updated snapshot . ?
  2. We know whenever the update/delete run on delta lake (by delta lake APIs) it re-process entire dataset, and update the deltaLog, so that we can get the updated snapshot. So when we do the same operation (ACID) by using Hive , it should also do the same what delta lake API does. ?
tdas commented 4 years ago
  1. When you say "Hive table on delta to access snapshot", do you mean "read Hive Metastore table using Spark", or "read the table using Hive"?
  1. I am not familiar with the internal details of Hive ACID works, so can't really compare. All I can say is Delta will use Apache Spark's data skipping capabilities to do an optimized scan on the table to find which files need to be rewritten (i.e. matches update/delete conditions) and only rewrite those files.
tdas commented 4 years ago

Hopefully, I answered your question. I am going to close this issue. Please reopen it if you have any further questions.