gettyimages / docker-spark

Docker build for Apache Spark
MIT License
679 stars 369 forks source link

SSH question? #40

Open alexleethinker opened 6 years ago

alexleethinker commented 6 years ago

Hello, Thanks for sharing this great work. It worked very well on my machine. But as I am a new learner of Docker and Hadoop/Spark. I got a confusing question about ssh when reading the Dockerfile.

Traditionally, when setting up a multi-node cluster we will need to set up ssh and hosts files in master/slave hosts, to enable communications between hosts. But in the Dockerfile I didn't find anything related to ssh. Even no ssh service is installed.

So I am really wondering how the master node is controlling slave nodes without ssh?

Sorry to disturb if you think this question is stupid, as I am new in this field.

Alex

OneCricketeer commented 4 years ago

Spark doesn't need SSH. It communicates over RPC calls to the schduler between drivers & workers

If you would like to ask Spark specific questions, join their mailing lists.