Open NateDawg97 opened 3 years ago
Hi,
First of all thanks for making this available!
I also am trying to connect from local spark to remote glue datacatalog via proxy.
I tried to set the proxy on the JVM via:
spark = (
SparkSession.builder
.config("spark.driver.extraJavaOptions", "-Dhttps.proxyHost=aaa -Dhttps.proxyPort=aaa -Dhttps.proxyUser=aaa -Dhttps.proxyPassword=aaa")
.getOrCreate()
)
but i still get ... Caused by: java.net.UnknownHostException: glue.hidden_region.amazonaws.com
(I've hidden the region - which is as expected).
Anything else I could try?
Thanks!
P.S.: @NateDawg97: did you manage to fix it?
Well, for anyone interested, I managed to get the proxy configured from pyspark via:
spark._jvm.java.lang.System.setProperty("https.proxyHost","aaa")
spark._jvm.java.lang.System.setProperty("https.proxyPort","aaa")
spark._jvm.java.lang.System.setProperty("https.proxyUser","aaa")
spark._jvm.java.lang.System.setProperty("https.proxyPassword","aaa")
Maybe it's like shooting an ant with a cannon, but it works 😄.
Now, when in local spark I do spark.sql("show databases").show()
I can see the dbs from the aws glue datacatalog.
Is there support for using this to connect from a local workstation to a remote AWS Glue Hive Catalog when the local client workstation has to go through an HTTP proxy?
For instance, with Spark, one can set the following to enable using HTTP proxy for accessing s3 data remotely into a Spark dataframe. Is there something equivalent for this Glue Hive catalog?