amplab / docker-scripts

Dockerfiles and scripts for Spark and Shark Docker images
261 stars 102 forks source link

Call to master/172.17.0.3:9000 failed on connection exception: java.net.ConnectException: Connection refused #24

Closed douglaz closed 10 years ago

douglaz commented 10 years ago

I tried to follow the Spark example (spark:0.8.0 image) but I get errors because no service is running on port 9000:

$ sudo docker attach 27550fe348c3410c50ff7a7a395a7444f79945fbc980dc78b401a96b75a54a3d
sudo: unable to resolve host ip-10-244-4-249
14/02/08 16:34:56 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:34:57 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:34:58 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:34:59 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:35:00 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:35:01 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/02/08 16:35:02 INFO ipc.Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
put: Call to master/172.17.0.3:9000 failed on connection exception: java.net.ConnectException: Connection refused
starting Spark Shell

Of course the sample will fail:

scala> val textFile = sc.textFile("hdfs://master:9000/user/hdfs/test.txt")
14/02/08 16:43:21 INFO MemoryStore: ensureFreeSpace(36192) called with curMem=0, maxMem=530593873
14/02/08 16:43:21 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 35.3 KB, free 506.0 MB)
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12

scala> textFile.count()
14/02/08 16:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/02/08 16:43:29 WARN LoadSnappy: Snappy native library not loaded
14/02/08 16:43:30 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 0 time(s).
14/02/08 16:43:31 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 1 time(s).
14/02/08 16:43:32 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 2 time(s).
14/02/08 16:43:33 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 3 time(s).
14/02/08 16:43:34 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 4 time(s).
14/02/08 16:43:35 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 5 time(s).
14/02/08 16:43:36 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 6 time(s).
14/02/08 16:43:37 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 7 time(s).
14/02/08 16:43:38 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 8 time(s).
14/02/08 16:43:39 INFO Client: Retrying connect to server: master/172.17.0.3:9000. Already tried 9 time(s).
java.net.ConnectException: Call to master/172.17.0.3:9000 failed on connection exception: java.net.ConnectException: Connection refused

Connecting on the master and checking the open ports, I get:

# lsof -n|grep LIST
sshd      131 root    3u  IPv4              36387      0t0      TCP *:ssh (LISTEN)
sshd      131 root    4u  IPv6              36389      0t0      TCP *:ssh (LISTEN)
java      172 hdfs   12u  IPv6              36486      0t0      TCP 172.17.0.3:7077 (LISTEN)
java      172 hdfs   17u  IPv6              36490      0t0      TCP *:http-alt (LISTEN)

Running Docker version 0.8.0, build cc3a8c8

douglaz commented 10 years ago

Update: tested on Docker 0.7.6 and it works well. It seems to be a problem only on the new Docker version.

AndreSchumacher commented 10 years ago

Thanks for the report. Its seems something related to the update breaks the HDFS setup.

AndreSchumacher commented 10 years ago

I guess this issue is outdated due to the recent upates and newer Docker versions. I'm closing this now. @douglaz please let me know if you still have problems.