Remove dependency on external docker images - Githubissues

elyra-ai / elyra

Elyra extends JupyterLab with an AI centric approach.

https://elyra.readthedocs.io/en/stable/

Apache License 2.0

1.86k stars 344 forks source link

Remove dependency on external docker images #1349

Open mirekphd opened 3 years ago

mirekphd commented 3 years ago

This "call home" (docker pull in https://github.com/elyra-ai/examples/blob/master/pipelines/dax_noaa_weather_data/analyze_NOAA_weather_data.pipeline#L17 of the "Pandas" docker image (https://hub.docker.com/r/amancevice/pandas/) from your Docker Hub account) will be unacceptable in any corporate security-restricted environment, where only security-approved docker containers and image registries are permitted (even if not running on completely air-gapped servers). In fact one should avoid any docker pull operations whatsoever (as this is what was assumed to work here), as these require root-level priviledges and will be unlikely to work (unless the notebook is run as root).

I might have missed it from reading the docs - please advise if there is a truly local option - running in the same docker container (supplied and approved by the corporation) where Jupyter Lab is being executed? I mean it is a false premise that this environment must be resource-poor. Jupyter client stays in a browser on a thin client machine, correct, but python kernel is nearly always run on the server-side, on a large compute node. No need to improve on this client-server arch that already works fine for individual Notebooks. Just let the user scripts (pipeline code payload) run on the same machine (but dedicated python kernel!) as the controlling notebook. This is how papermill works by the way (no dependency on external docker images).

ptitzler commented 3 years ago

[... ] will be unacceptable in any corporate security-restricted environment [...]

You are absolutely right. To address this container images being used to run notebook or Python scripts can also be stored in private/local container registries, as mentioned here: https://elyra.readthedocs.io/en/latest/user_guide/runtime-image-conf.html#prerequisites. I just did notice this in the documentation, which we need to fix.

[...] if there is a truly local option - running in the same docker container (supplied and approved by the corporation) where Jupyter Lab is being executed?

Yes. The "local" runtime configuration is provided for that exact purpose. It is defined by default (unlike Kubeflow Pipelines configuration), and "local" refers to the machine where JupyterLab is running. We currently don't have this documented at https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html, but probably should.

ptitzler commented 3 years ago

Transferring the issue to elyra-ai/elyra.

mirekphd commented 3 years ago

@ptitzler thank you for addressing both our concerns!

So if the docker image running Jupyter and Elyra can be completely user-defined and generic (i.e. without any custom runtime applications) and can be scanned for vulnerabilities and perhaps even hosted in an internal image registry (like Red Hat Container Registry), then it looks like "problem solved" and just a documentation issue:) I will give it a solid test drive in our on-prem Openshift installation when the time permits (which is most likely next weekend:)

ptitzler commented 3 years ago

Absolutely! We do publish an official Elyra container image based on the files in https://github.com/elyra-ai/elyra/tree/master/etc/docker/elyra that you could use as a baseline. Feel free to open an issue if you do run into trouble getting an image to work or reach out on https://gitter.im/elyra-ai/community. @lresende, @akchinSTC fyi

ptitzler commented 3 years ago

Opened https://github.com/elyra-ai/elyra/issues/1350 to improve the content for custom Elyra container images

mirekphd commented 3 years ago

Opened #1350 to improve the content for custom Elyra container images

Perfect, so I will report any problems I might encounter with the Elyra container under Openshift in this new issue.

ptitzler commented 3 years ago

I should probably mention that Elyra can be installed as part of Open Data Hub on Red Hat OpenShift: https://elyra.readthedocs.io/en/latest/recipes/deploying-elyra-with-opendatahub.html

mirekphd commented 3 years ago

I should probably mention that Elyra can be installed as part of Open Data Hub on Red Hat OpenShift: https://elyra.readthedocs.io/en/latest/recipes/deploying-elyra-with-opendatahub.html

Yes, I've heard about it from Red Hat people, but the only problem (as with Kubeflow on its own) is our obsolete Openshift installation (still running 3.11).

ptitzler commented 3 years ago

I turns out (my apologies) that container images built using https://github.com/elyra-ai/elyra/blob/master/etc/docker/elyra/Dockerfile won't work on RHOS. We've still got some work to do documenting how we are building the image for Open Data Hub.

mirekphd commented 3 years ago

how we are building the image for Open Data Hub.

I understand you completely, it's never been easy to port containerized apps from Docker to Openshift due to additional security considerations (Jupyter Notebook was I think the only one working there out of the box:)

TiemenSch commented 2 years ago

I can't seem to find the "local" runtime option. Is it at all possible to use Elyra, but disable all containerization options and just use the available (Python) kernels on the host that runs JupyterLab?

ptitzler commented 2 years ago

I can't seem to find the "local" runtime option.

To use the local runtime you must create pipelines using the generic pipeline editor:

When you click "run" the option is displayed:

This option is unavailable in the Kubeflow Pipelines and Airflow pipeline editor.

Is it at all possible to use Elyra, but disable all containerization options and just use the available (Python) kernels on the host that runs JupyterLab?

Not currently. The options for Airflow and Kubeflow Pipelines are always displayed by the UI.