Closed MattCoachCarter closed 4 years ago
I was using the wrong port, sorry!
Hi, can you help with the set up of spark config to connect to hive using these containers? I added the following to spark config:
spark_config = {
"spark.hive.metastore.uris": "thrift://localhost:9083",
"spark.hadoop.dfs.namenode.http-address": "webhdfs://localhost:50070"
}
spark_conf = SparkConf()
for attribute, value in spark_config.items():
spark_conf.set(attribute, value)
spark = SparkSession.builder.config(conf=spark_conf).enableHiveSupport().getOrCreate()
I am able to connect to metastore properly as I am able to do
spark.sql("describe test_db.test_table").printSchema()
where test_db.test_table
is a table I created directly through hive.
However, when I attempt to select the content of the table (spark.sql("select * from test_db.test_table").show()
), it gives me the following error:
pyspark.sql.utils.IllegalArgumentException: java.net.UnknownHostException: namenode
Not sure what I am missing here. Thanks for the help in advance.
I'm trying to run a spark app that connects to HDFS using the docker-compose in this repo (which I have modified). The Spark container I am using is, I believe, able to connect to the HDFS container, but it receives a RPC error soon after:
I've tried a handful of things with no success, was wondering if anyone had an idea of how I can troubleshoot this: