feat: update create command to support Intel GPU notebooks

This commit introduces an automatic way of scheduling Notebook Servers on Nodes labeled as intel.feature.node.kubernetes.io/gpu. This commit also affects the command and args that run for the Notebook Servers containers, as these are now part of the requirements for making Intel GPUs work. They should be exactly as this.

Fixes #147

Manual testing

Assuming you have a microk8s cluster with hostpath-storage enabled

Clone this repository, checkout to the branch of this PR
Build and install from source pip install .
Initialise dss initialize --kubeconfig="$(sudo microk8s config)"
Label the node to simulate the Intel plugin has done its job kubectl label node <name of your node> intel.feature.node.kubernetes.io/gpu=true
Create a notebook dss create my-notebook --image=ubuntu
Verify the Deployment of the server has the resource limits we are interested in:

kubectl get deployment -ndss my-notebook -oyaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    ...
    spec:
      containers:
      - env:
        - name: MLFLOW_TRACKING_URI
          value: http://mlflow.dss.svc.cluster.local:5000
        image: kubeflownotebookswg/jupyter-pytorch-full:v1.8.0
        imagePullPolicy: IfNotPresent
        name: my-notebook
        ports:
        - containerPort: 8888
          name: notebook-port
          protocol: TCP
        resources:
          limits:
            gpu.intel.com/i915: "1" # <--- we are interested in this

Also verify that the command and args are always set:

          command:
            - jupyter
          args:
            - lab
            - --notebook-dir=/home/jovyan
            - --ip=0.0.0.0
            - --no-browser
            - --allow-root
            - --port=8888
            - --ServerApp.token=''
            - --ServerApp.password=''
            - --ServerApp.allow_origin='*'
            - --ServerApp.allow_remote_access=True
            - --ServerApp.authenticate_prometheus=False
            - --ServerApp.base_url='/'

Alternatively, you can try the create command w/o labelling the Node. In that case, the Deployment should not have any of the resource limits.

canonical / data-science-stack

feat: update create command to support Intel GPU notebooks #162

Manual testing