Booz Allen's lean manufacturing approach for holistically designing, developing and fielding AI solutions across the engineering lifecycle from data processing to model building, tuning, and training to secure operational deployment
Other
34
stars
8
forks
source link
BUG: Hive Metastore v4.0.0 is not compatible with delta lake tables #477
Due to our recent upgrade of our Hive Metastore to version 4.0.0, Hive is no longer compatible with delta connector (ref: Delta supported Hive versions). This results in data access being unable to access your data when it is stored in a delta table, which is where large datasets from your spark pipeline would typically be stored.
Steps to Reproduce
Create a new project from the archetype
Add a pipeline that persists data to delta lake
Access the delta lake table using data access
Expected Behavior
You are able to view the delta lake table
Actual Behavior
Hive does not have an entry for the delta lake table
Additional Context
Connecting directly to the spark thrift server pod and connecting via beeline reveals that an entry for the delta lake table is not getting created in the hive metastore:
0: jdbc:hive2://spark-infrastructure-sts-serv> show tables from default;
+------------+------------+--------------+
| namespace | tableName | isTemporary |
+------------+------------+--------------+
+------------+------------+--------------+
No rows selected (0.077 seconds)
Description
Due to our recent upgrade of our Hive Metastore to version
4.0.0
, Hive is no longer compatible with delta connector (ref: Delta supported Hive versions). This results in data access being unable to access your data when it is stored in a delta table, which is where large datasets from your spark pipeline would typically be stored.Steps to Reproduce
Expected Behavior
Actual Behavior
Additional Context