LukeTillman / dse-docker

DataStax Enterprise running in a Docker Container
Apache License 2.0
47 stars 17 forks source link

[Question] How can I use Jupyter or Zeppelin to do analytics through Spark? #17

Closed harshblue closed 7 years ago

harshblue commented 7 years ago

@LukeTillman Thanks for this excellent docker image. Superb! I am trying to use Jupyter or Zeppelin on Spark, do you have an idea on how I can accomplish this?

LukeTillman commented 7 years ago

I don't have any examples handy, but I'm sure it's possible. The first thing would be to make sure you're starting this Docker container in Analytics mode (i.e. using -k) as shown here:

https://github.com/LukeTillman/dse-docker#example-start-an-analytics-spark-node

I'm not sure what ports Jupyter or Zeppelin will need to communicate with Spark, but you'll also want to make sure you expose those ports when starting the container:

https://github.com/LukeTillman/dse-docker#exposing-ports-on-the-host

For a full list of the ports that DSE exposes (including some of the Spark ports), see:

http://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/security/secFirewallPorts.html

Then it would just be a matter of configuring Jupyter/Zeppelin to talk to the DSE node in Docker (which I'm sure you can find information on in their documentation). If you're running Jupyter/Zeppelin on your local machine and DSE in Docker (for Mac/Windows) you should be able to access the DSE ports you exposed via localhost or 127.0.0.1.

Hope that helps. Good luck!