dylanmei / docker-zeppelin

Docker build for Zeppelin, a web-based Spark notebook
221 stars 122 forks source link

Python Interpreter for parsing html/xml in zepplin #9

Open mkscala opened 8 years ago

mkscala commented 8 years ago

I wanted to use the Python APIs like BeautifulSoup, how can we use external python api's along with pyspark http://omz-software.com/pythonista/docs/ios/beautifulsoup_guide.html http://apache-spark-user-list.1001560.n3.nabble.com/How-to-consider-HTML-files-in-Spark-td22017.html https://pypi.python.org/pypi/beautifulsoup4

dylanmei commented 8 years ago
%sh pip install beautifulsoup4

I don't know any details about actually using it effectively with Spark. Spark has a very active mailing list and freenode #apache-spark IRC channel which I'm sure will yield better tips.

mkscala commented 8 years ago

I get the below error Process exited with an error: 127 (Exit value: 127)

mkscala commented 8 years ago

how to add the basic python interpreter ? have you tried ? within this docker-zepplin?

dylanmei commented 8 years ago

There is only the pyspark interpreter. Perhaps you'd get more mileage with this: https://github.com/jupyter/docker-stacks