JahstreetOrg / spark-on-kubernetes-helm

Spark on Kubernetes infrastructure Helm charts repo
Apache License 2.0
199 stars 75 forks source link

Deploying new profile to existing k8s Hub cluster with SparkMagic #47

Closed KnutFr closed 3 years ago

KnutFr commented 3 years ago

Hi,

First, thanks. for your amazing job ! I have deployed your charts in cluster mode with livy/jupyter on a single k8s cluster and everything works great. But here's my setup : I have an already existing JupyterHub instance (deployed with https://github.com/jupyterhub/zero-to-jupyterhub-k8s) that already have some profile. For example here the sample of the actual datascientist profile that we have :

      description: "Environment for data scientist"
      kubespawner_override:
        image: mycompanyregistry/singleuser-datascientist:stable

But now I would like to add a new one.

I have fork your Dockerfile here (https://github.com/JahstreetOrg/spark-on-kubernetes-docker/blob/master/jupyter/4.6.3-sparkmagic_0.15.0/Dockerfile) with the single-user entry point by default and added just this env vars for testing purpose:

ENV JUPYTER_ALLOW_INSECURE_WRITES=true
ENV JUPYTER_RUNTIME_DIR=/tmp

I'm able to run these commands and the session manager is shown:

%load_ext sparkmagic.magics
%manage_spark

But when I'm trying to run some %%configure command I encountered the following error :

UsageError: Cell magic %%configure not found.

And when I'm trying to create new notebook, I don't have the possibility to choose pyspark or spark Kernel neither, only python 3 option is available.

Last thing, I have manually override at docker build the spark magic config with this :

  "kernel_python_credentials" : {
    "username": "",
    "password": "",
    "url": "https://my-cluster.example.com/livy",
    "auth": "None"
  },

Of course the spawning container have /etc/hosts matching for my POC Livy cluster internal IP :)

Do you have any idea of what I have miss ?

Thanks a lot ! KnutFr

jahstreet commented 3 years ago

Hi. Please find the steps I used to reproduce your issue below.

  1. Build custom Jupyter Docker image for using with the new JupyterHub profile. Diff with https://github.com/JahstreetOrg/spark-on-kubernetes-docker/tree/v3.0.1-hadoop-3.2.0-cloud :
diff --git a/jupyter/4.6.3-sparkmagic_0.15.0/Dockerfile b/jupyter/4.6.3-sparkmagic_0.15.0/Dockerfile
index 58cd97a..0a59f8d 100644
--- a/jupyter/4.6.3-sparkmagic_0.15.0/Dockerfile
+++ b/jupyter/4.6.3-sparkmagic_0.15.0/Dockerfile
@@ -8,6 +8,9 @@ FROM $BASE_CONTAINER

 LABEL maintainer="Aliaksandr Sasnouskikh <jaahstreetlove@gmail.com>"

+ENV JUPYTER_ALLOW_INSECURE_WRITES=true
+ENV JUPYTER_RUNTIME_DIR=/tmp
+
 # Install Sparkmagic kernel
 # https://github.com/jupyter-incubator/sparkmagic/blob/master/Dockerfile.jupyter
 USER root
diff --git a/jupyter/4.6.3-sparkmagic_0.15.0/build.sh b/jupyter/4.6.3-sparkmagic_0.15.0/build.sh
index 12b77b2..39b82da 100755
--- a/jupyter/4.6.3-sparkmagic_0.15.0/build.sh
+++ b/jupyter/4.6.3-sparkmagic_0.15.0/build.sh
@@ -11,7 +11,7 @@ no_cache="--no-cache"
 parent_dir_path=$(dirname ${dir_path})

 repo="$DOCKERHUB_REPO/${parent_dir_path##*/}"
-tag="${dir_path##*/}"
+tag="${dir_path##*/}-modified"

 ( cd ${dir_path}; docker build . ${no_cache} -t "${repo}:${tag}" )
 docker push "${repo}:${tag}"
  1. Modify spark-cluster chart values.yaml:

spark-on-kubernetes-helm/charts/spark-cluster/values.yaml:

index f03ea87..c84339e 100644
--- a/charts/spark-cluster/values.yaml
+++ b/charts/spark-cluster/values.yaml
@@ -325,6 +325,18 @@ jupyterhub:
         cmd:
         - "/opt/singleuser-entrypoint.sh"
         - "--NotebookApp.notebook_dir=/home/jovyan/notebooks"
+    - display_name: "Jupyter Notebooks 2"
+      description: "Sparkmagic kernel 2"
+      default: False
+      # https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html#kubespawner.KubeSpawner
+      kubespawner_override:
+        image: sasnouskikh/jupyter:4.6.3-sparkmagic_0.15.0-modified
+        environment:
+          # For Sparkmagic Kernel
+          LIVY_ENDPOINT: "http://livy-server:80"
+        cmd:
+        - "/opt/singleuser-entrypoint.sh"
+        - "--NotebookApp.notebook_dir=/home/jovyan/notebooks"

     # defaultUrl: "/lab"
     # extraTolerations: []
  1. Install cluster following spark-on-kubernetes-helm/DOCUMENTATION.md guide.

Following these steps the new JupyterHub profile is added and it works as expected (tested by examples).

The possible reasons of you issue are the following:

Please let me know if it clarifies the setup for you.