GoogleCloudDataproc / dataproc-jdbc-connector

Apache License 2.0
3 stars 8 forks source link

Add support for Spark (reusing Hive code) #14

Open karth295 opened 4 years ago

karth295 commented 4 years ago

Spark has a fork of HiveServer2 it uses to support JDBC: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-thrift-server.html. And you use Hive's JDBC client to interact with it.

This means that the bulk of what we need is already done. Here are remaining TODOs afaik:

1) Create and document an init action to start a Spark thrift server. Part of that init action will be to configure Knox to expose the Spark thrift server.

2) Change the JDBC connector to accept jdbc:dataproc://spark and translate it into using the component gateway path for Spark.

3) Update the README to reflect this.

karth295 commented 3 years ago

Init action: gs://hive-http-mode-init-action/spark-http-config.sh. Note that it disables the regular hive-server2 and runs Spark's hive-server2 in its place.

Now it's just a matter of documenting this init action in the README -- I'll leave this issue open for that.