An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Can we able to create Hive table on delta to access updated snapshot . ?
We know whenever the update/delete run on delta lake (by delta lake APIs) it re-process entire dataset, and update the deltaLog, so that we can get the updated snapshot. So when we do the same operation (ACID) by using Hive , it should also do the same what delta lake API does. ?
When you say "Hive table on delta to access snapshot", do you mean "read Hive Metastore table using Spark", or "read the table using Hive"?
If you mean "read Hive metastore table using spark", then the answer is oss Delta does not support metastore tables yet because Apache Spark 2.x lacks the interfaces needed to make it work. Apache Spark 3.0 with DataSourceV2 will have all the necessary pluggable interfaces to make Delta work with Hive Metastore tables. We are actively working with the spark community to make this work.
I am not familiar with the internal details of Hive ACID works, so can't really compare. All I can say is Delta will use Apache Spark's data skipping capabilities to do an optimized scan on the table to find which files need to be rewritten (i.e. matches update/delete conditions) and only rewrite those files.