TIBCOSoftware / snappydata

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
http://www.snappydata.io
Other
1.04k stars 202 forks source link

What is the plan/roadmap for supporting Hive metastore tables stored in HDFS and SnappyData tables via SQL? #762

Open LarrySchumacher opened 7 years ago

LarrySchumacher commented 7 years ago

Can SnappyData tables coexist with Hive metastore tables stored on HDFS? If not is this capability planned for a future release?

We have a large HDFS/Spark cluster with a large amount of data stored in HDFS as Hive metastore tables. We are adding a SnappyData cluster to the mix and would like to be able to access both SnappyData tables and Hive metastore tables stored on HDFS via SQL.

jramnara commented 7 years ago

Is this not working? If the Hive table was created using a format supported in Spark like Parquet or ORC, you should be able to create an external table simply pointing your path to the HDFS folder, no ? Is this works, then, you can work with these external tables along with Snappy Tables in the same query, etc.

Or, using a SparkSession enabled with Hive - you can create a SparkSession with Hive enabled (org.apache.spark.sql.SparkSession.builder.enableHiveSupport.getOrCreate) , create tables and access using this session instance. And, you can access Snappy Tables using a SnappySession. Unfortunately, you cannot join tables accessible like this with Snappy Tables in the same query.

When it comes to automatically integrating with an external Hive meta store, we would need to explore if and how this is possible.