Open dogukanburda opened 2 years ago
You will likely need to configure mounts and mirrorWorkingDirs
in your respective kernel-pod.yaml
file for each applicable kernel spec.
Thank you for your quick response.
Since kernel-pod.yaml
is generated and deployed by enterprise-gateway pod I tried to deploy enterprise-gateway with KERNEL_VOLUME_MOUNT
env variable defined.
After a deep dive on enterprise-gateway's source It looked to me to give this variable in etc/kubernetes/helm/enterprise-gateway/templates/deployment.yaml
file
I couldnt gave the KERNEL_VOLUME_MOUNT and KERNEL_VOLUMES as env variables due to their structure not being plain strings.
How is it possible to give a yaml-array like value to an env variable ? Sadly couldnt find anything on google.
KERNEL_VOLUME_MOUNT=
- name: userdir-pvc
mountPath: "/mnt"
Most close I get is by defining a volumemounts
value in etc/kubernetes/helm/enterprise-gateway/values.yaml
file
volumemounts:
- name: userdir-pvc
mountPath: "/mnt"
and appending this in etc/kubernetes/helm/enterprise-gateway/templates/deployment.yaml
file's env section
- name: KERNEL_VOLUME_MOUNT
value: {{ .Values.volumemounts }}
I get a yaml error obviously trying to define volumemounts as an array. Any help would be strongly appreciated.
And also Is it really a good choice to iterate over an env variable that has a custom structure such as this one ?
This discussion/issue should be moved to the Enterprise Gateway repo as it has nothing to do with Elyra. That said, let me respond in an attempt to perhaps get you moving forward. Should there still be issues (which is likely), please open an issue in EG and we'll go from there.
First of all, I agree that this is a bit of a mess. Because there isn't a good way to parameterize kernel launches (which needs to span the entire Jupyter stack), the best we can do is flow environment variables as the parameters, which, yes, relegates us to encoding more complex types into strings (which is non-trivial) - particularly when the number of mounts themselves vary per user/kernel launch.
Most close I get is by defining a volumemounts value in etc/kubernetes/helm/enterprise-gateway/values.yaml file
These mounts will only apply to the EG pod and not each of the kernel pods. Instead, I recommend you make the necessary adjustment to the kernel-pod.yaml.j2
template that do NOT use KERNEL_
values and get mounts working for your kernel. Once you have that working, you should be able to replace the "varying" portion of that stanza, like the user's home directory, with a templated value (e.g., {{ kernel_home_dir }}
) where KERNEL_HOME_DIR
can be supplied from the client-side when the kernel is launched.
To make iterating over the kernel-pod changes easier, it is recommended that you mount the /usr/local/share/jupyter/kernels
directory into your EG pod (and for which you can edit the helm chart files) so that edits can be made to the respective kernel-pod.yaml.j2
files located in each of the kernelspecs' scripts
directories.
And also Is it really a good choice to iterate over an env variable that has a custom structure such as this one ?
No. Per my previous comment, it's all we have. When the number of mounts is consistent across users, I would use the fixed approach with variances described via envs, but when the requirement is that different users require different mounts entirely, then we have to take the "conditional" approach where the complete mount stanzas are encoded. Unless, of course, you have other ideas (which should be discussed over in the EG repo). Thanks.
Describe the issue I am following the tutorial In Introduction to generic pipelines and
Part 1 - Data Cleaning.ipynb
in elyra-ai/examples is unable to find the data downloaded withload_data.ipynb
included in the same tutorial in generic pipeline running on local runtime.As far as I understand when I run the pipeline,
load_data.ipynb
is connected to a kernel deployed on a node that deployed by Jupyter Enterprise Gateway and it is able to download the data succesfully. But when the second notebook runs, its assigned to another node deployed by EG (Enterprise Gateway) and does not have a common file storage mounted, therefore is unable to find the data needed to proceed.Doesn't these notebooks should run in environments that have shared storage mounted on each of them ?
Persistance storage for JupyterHub instances for each user works totally fine. But when it comes to running a pipeline their environments does not share any file resources.
To Reproduce Steps to reproduce the behavior:
Screenshots or/and log output Log:
Log Output
data/
folder seen at the screenshot is generated by runningload_data.py
as a single node in pipeline and setup validation example in elyra-ai/examples is running just fine.Expected behavior Pipeline should produce appropriate output in pipeline's working directory.
Deployment information Describe what you've deployed and how: Deployed Jupyter Enterprise Gateway
helm install --namespace enterprise-gateway enterprise-gateway https://github.com/jupyter-server/enterprise_gateway/releases/download/v2.6.0/jupyter_enterprise_gateway_helm-2.6.0.tgz
Deployed JupyterHub with elyra/elyra:3.6.0 oficial image with Helm v3 with following command
where
jupyter-elyra-config.yml
contains only the following :Pipeline runtime environment If the issue is related to pipeline execution, identify the environment where the pipeline is executed
Runtime configuration settings If the issue is related to pipeline execution, document the runtime configuration settings from the Elyra UI, omitting confidential information.