gettyimages / docker-spark

Docker build for Apache Spark
MIT License
679 stars 369 forks source link

Question: Use from Java app #26

Open beradrian opened 7 years ago

beradrian commented 7 years ago

I'm using your docker-compose.yml in a Docker container.

How can I setup a SparkSession?

    SparkSession spark = SparkSession
            .builder()
            .appName("Java Spark SQL basic example")
            .master("spark://master:7077")
                .getOrCreate();

I actually set in etc/hosts master as the docker machine IP.

The one above does not seem to work.

yeikel commented 6 years ago

Hi , I also need help to complete the configuration

I was able to connect setting the master URL to spark://localhost:7077 but for some reason, the command line execution is including my host machine in the "CoarseGrainedScheduler" parameter.

I tried : conf.set("spark.dynamicAllocation.enabled","false"); but that did not work

Spark Executor Command: "/usr/jdk1.8.0_131/bin/java" "-cp" "/conf:/usr/spark-2.3.0/jars/*:/usr/hadoop-2.8.3/etc/hadoop/:/usr/hadoop-2.8.3/etc/hadoop/*:/usr/hadoop-2.8.3/share/hadoop/common/lib/*:/usr/hadoop-2.8.3/share/hadoop/common/*:/usr/hadoop-2.8.3/share/hadoop/hdfs/*:/usr/hadoop-2.8.3/share/hadoop/hdfs/lib/*:/usr/hadoop-2.8.3/share/hadoop/yarn/lib/*:/usr/hadoop-2.8.3/share/hadoop/yarn/*:/usr/hadoop-2.8.3/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.8.3/share/hadoop/mapreduce/*:/usr/hadoop-2.8.3/share/hadoop/tools/lib/*" "-Xmx1024M" "-Dspark.driver.port=56310" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@yeikel-pc:56310" "--executor-id" "22" "--hostname" "172.19.0.3" "--cores" "2" "--app-id" "app-20180401002500-0001" "--worker-url" "spark://Worker@172.19.0.3:8881"

Which produces the following error :

Caused by: java.io.IOException: Failed to connect to yeikel-pc:56310
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: yeikel-pc