Closed NohaIhab closed 10 months ago
by inspecting the NGC image nvcr.io/nvidia/pytorch:23.09-py3
, the entrypoint is:
"Entrypoint": [
"/opt/nvidia/nvidia_entrypoint.sh"
],
looking at the script nvidia_entrypoint.sh
, it prints out some text and runs scripts that perform checks on drivers (cpu, gpu, and network drivers).
and we know from building notebook server rocks that the entrypoint needed to spin up a notebook is:
jupyter lab --notebook-dir="/home/jovyan" --ip=0.0.0.0 --no-browser --port=8888 --ServerApp.token="" --ServerApp.password="" --ServerApp.allow_origin="*" --ServerApp.base_url=${NB_PREFIX} --ServerApp.authenticate_prometheus=False
An important note here is the base_url
arg must be set to the NB_PREFIX
environment variable to be able to connect to the notebook. The env NB_PREFIX
is injected in the pod when created by the notebook controller, so we know for sure it will be in the pod spec.
the PodDefault
that worked at the end is:
apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
name: ngc
spec:
args:
- jupyter
- lab
- --notebook-dir
- /home/jovyan
- --ip
- 0.0.0.0
- --no-browser
- --port
- "8888"
- --NotebookApp.token
- ""
- --NotebookApp.password
- ""
- --NotebookApp.allow_origin
- '*'
- --NotebookApp.base_url
- $(NB_PREFIX)
- --NotebookApp.authenticate_prometheus
- "False"
command:
- /opt/nvidia/nvidia_entrypoint.sh
desc: Configure NVIDIA NGC JupyterLab Notebook
selector:
matchLabels:
ngc: "true"
the jupyter lab
command needed to be modified slightly for the jupyter version in the NGC image
Note: the $(NB_PREFIX)
where the variable is in parentheses, this is a requirement by kubernetes to expand a variable in the args field.
using the PodDefault above, and labeling the statefulset of a notebook with ngc: "true"
, I'm able to create a notebook with the image nvcr.io/nvidia/pytorch:23.09-py3
and connect to the notebook
The above PodDefault looks good and we also saw it live with Noha.
I'd only suggest we use a more descriptive name for desc
. Maybe something like Configure NVIDIA NGC JupyterLab Notebook
to make it explicit that his poddefault is for
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5158.
This message was autogenerated
What needs to get done
Explore and document the configuration required to run a JupyterLab Notebook using an NGC container. It's expected to be a PodDefault that sets the entrypoint.
Why it needs to get done
currently, NGC containers don't run on Kubeflow Jupyter Notebooks out of the box. We want to enable CKF users to spin up a Notebook with an NGC container.