Azure / Hadoop-Migrations

Hadoop Migrations to Azure
MIT License
28 stars 11 forks source link

Common metadata between Databricks and Hive: #68

Closed ramyerrabotu closed 3 years ago

ramyerrabotu commented 3 years ago

The Customers are expecting a common database for metadata for both hive and Databricks as the customer wants to use hive for some workloads and DB for other workloads

ram4bas-zz commented 3 years ago

It's possible to share the external metastore between HDinsight / on-prem Hive to Databricks and Synapse.

Here are the steps:

dbutils.fs.mkdirs("dbfs:/databricks/init/")

dbutils.fs.put( "/databricks/init/external-metastore.sh", """#!/bin/sh |# Loads environment variables to determine the correct JDBC driver to use. |source /etc/environment |# Quoting the label (i.e. EOF) with single quotes to disable variable interpolation. |cat << 'EOF' > /databricks/driver/conf/00-custom-spark.conf |[driver] { | # Hive specific configuration options for metastores in the local mode. | "spark.hadoop.javax.jdo.option.ConnectionURL" = "jdbc:sqlserver://<>.database.windows.net:1433;database=<>;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=300" | "spark.hadoop.javax.jdo.option.ConnectionUserName" = "<>" | "spark.hadoop.javax.jdo.option.ConnectionPassword" = "<>" | "hive.metastore.schema.verification.record.version" = "true" | "spark.sql.hive.metastore.jars" = "maven" | "hive.metastore.schema.verification" = "true" | "spark.sql.hive.metastore.version" = "2.1.1" |EOF |# Add the JDBC driver separately since must use variable expansion to choose the correct |# driver version. |cat << EOF >> /databricks/driver/conf/00-custom-spark.conf | "spark.hadoop.javax.jdo.option.ConnectionDriverName" = "com.microsoft.sqlserver.jdbc.SQLServerDriver" |} |EOF |""".stripMargin, overwrite = true )