StatCan / aaw

Documentation for the Advanced Analytics Workspace Platform
https://statcan.github.io/aaw/
Other
68 stars 12 forks source link

Spark-Dev: Enable Spark Development on AAW Notebooks #1883

Open Souheil-Yazji opened 1 year ago

Souheil-Yazji commented 1 year ago

A continuation of https://github.com/StatCan/aaw/issues/1867 dedicated for enabling spark development on jupyterlab notebooks.

Enabling Spark Development on AAW Notebooks

This dockerfile from Jupyterlab contains the installation and setup for spark, but is missing the installation of scala, which doesn't seem to be required. The get-spark-stuff.sh script in aaw-kubeflow-containers seems to pull these dockerfile and it is called in the makefile under generate-Spark. generate-Spark is not called anywhere when building the actual images, since we call generate-dockerfiles, which indicates that this isn't actually doing anything. We''ll want to add a call to generate-spark to produce the spark layer.

Scala can be installed with pip3 install scala or possibly with conda/mamba but not with the default channels (I believe bioconda has it).

Adding Spark to the Dockerfiles

https://github.com/StatCan/aaw-kubeflow-containers/pull/551/files

The get-spark-stuff shell script is simply concatinating the dockerfiles, which I'm not really a fan off. The layer also doesn't work out-of-the-box because some upstream files are missing from our docker context.