amplab / docker-scripts

Dockerfiles and scripts for Spark and Shark Docker images
261 stars 102 forks source link

Error in Workers #19

Closed vivekbeniwal closed 10 years ago

vivekbeniwal commented 10 years ago

While running spark cluster with docker 0.7, I am getting this error:

13/12/03 19:04:38 ERROR StandaloneExecutorBackend: error while creating actor java.net.UnknownHostException: 1a183a2affd5: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258) at java.net.InetAddress.getAllByName0(InetAddress.java:1211) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at java.net.InetAddress.getAllByName(InetAddress.java:1063) at java.net.InetAddress.getByName(InetAddress.java:1013) at akka.remote.netty.ActiveRemoteClient$$anonfun$connect$1.apply$mcV$sp(Client.scala:170) at akka.util.Switch.liftedTree1$1(LockUtil.scala:33) at akka.util.Switch.transcend(LockUtil.scala:32) at akka.util.Switch.switchOn(LockUtil.scala:55) at akka.remote.netty.ActiveRemoteClient.connect(Client.scala:158) at akka.remote.netty.NettyRemoteTransport.send(NettyRemoteSupport.scala:153) at akka.remote.RemoteActorRef.$bang(RemoteActorRefProvider.scala:247) at org.apache.spark.executor.StandaloneExecutorBackend.preStart(StandaloneExecutorBackend.scala:48) at akka.actor.ActorCell.create$1(ActorCell.scala:508) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:600) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:209) at akka.dispatch.Mailbox.run(Mailbox.scala:178) at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516) at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Versions of Ubuntu and Docker: Ubuntu 12.04.3 LTS , Release: 12.04, Codename: precise Docker version 0.7.0, build 0d078b6

AndreSchumacher commented 10 years ago

This seems to be the same issue with name resolution issues of the driver/console container with newer versions of Docker. See the discussion around pull request 17. A temporary fix is to ssh into the master using the command that shows up when the cluster is started. Then start the shell there.

vivekbeniwal commented 10 years ago

Hi Andre,

I replaced the id_rsa file with my own, but it keeps asking for a password root@ , i think i am missing really basic here related to rsa keys. Can you tell me do we have to replace any file other than apache-hadoop-hdfs-precise/files/id_rsa file.

Regards Vivek

AndreSchumacher commented 10 years ago

Hi Vivek, the easiest way to do that is to use the original (secret) key to scp your new (public key) to the master and workers. You'll need to store it in /root/.ssh. There should be plenty of tutorials for passwordless ssh login.

Note that (since the scripts start a new cluster every time) you need to do that each time your start a new cluster. If you want to make the changes permanent the easiest thing to do would be to re-generate the image. The scripts to do that are all inside the repository.

Maybe as a future feature one could have the user specify a keypair when starting the cluster. I believe this is the way the Spark EC2 scripts do it.

vivekbeniwal commented 10 years ago

Thank you that worked.