erikerlandson / ray-ubi

A minimalist Ray distributed computing container image, based on Red Hat UBI
Apache License 2.0
2 stars 1 forks source link

create and maintain ray images in AICoE #2

Open pacospace opened 3 years ago

pacospace commented 3 years ago

Hi @erikerlandson,

what do you think if thoth can automatically create and maintain these images with their dependencies for you using thoth pipelines and bots?

Similar to what we do for ODH images for example: https://quay.io/repository/thoth-station/s2i-lab-elyra?tab=tags, which is crated from: https://github.com/opendatahub-io/s2i-lab-elyra.

erikerlandson commented 3 years ago

@pacospace yes, building them with thoth was on my roadmap. I had a couple questions.

  1. currently the ray libraries are installed from nightly-build wheels (not pypi) - will thoth work around that?
  2. two of my 3 images are NOT notebook images - they are the ray worker-node image and the ray operator image: they are not part of the s2i-xxx image lineage. Does thoth also build these? (would be analogous to building spark worker images)
pacospace commented 3 years ago

@pacospace yes, building them with thoth was on my roadmap. I had a couple questions.

  1. currently the ray libraries are installed from nightly-build wheels (not pypi) - will thoth work around that?

can I ask why nightly-build wheels are used? are there some specific features in https://github.com/erikerlandson/ray-odh-jupyter/blob/44b055935c965219559b7840b23c02c9438bd385/images/ray-minimal-notebook/requirements.txt#L1 not available in the stable release of ray from PyPI?

  1. two of my 3 images are NOT notebook images - they are the ray worker-node image and the ray operator image: they are not part of the s2i-xxx image lineage. Does thoth also build these? (would be analogous to building spark worker images)

aicoe pipelines can also take care of these cases, but thoth cannot advise on these specifically currently.

Do you want to migrate to one AICoE repo (all thoth services are already set there) and we can start with one repo restructuring Dockerfile in a way that thoth can provide services for you? (using Pipfile/Pipfile.lock and micropipenv)? wdyt?

cc @harshad16

erikerlandson commented 3 years ago

can I ask why nightly-build wheels are used?

Previous versions of Ray would only allow a connection to the ray cluster from the physical head node. Ray 2.0 allows remote connections, which are what allow a jupyter notebook to connect to a ray cluster. However, Ray 2.0 is under development, and is purely head-of-dev-branch.

erikerlandson commented 3 years ago

cross-reference: the repo where I build the ray notebook images (ray-ml-notebook is what is currently installed on MOC) https://github.com/erikerlandson/ray-odh-jupyter

erikerlandson commented 3 years ago

Elyra seems to be getting a lot of traction. Would it be useful to create ray-enabled elyra images?

pacospace commented 3 years ago

Elyra seems to be getting a lot of traction. Would it be useful to create ray-enabled elyra images?

Using that image I would use Ray for all my notebooks? As Data Scientist I want to use Ray in one of my step maybe, not by default for all notebooks, wdyt?

As data scientists,

I want to run hyperparameter tuning in an AI pipeline.

Currently, there is no way to do that directly from Elyra: https://github.com/elyra-ai/elyra/issues/646, you can run a pipeline from a pipeline using kfp libraries for example but still not yet the best if integrated with Elyra currently. But I'm trying to have that feature upstream.

AI Pipeline in Elyra uses Kubeflow engine (Argo or Tekton) or Airflow engine and each step requires a base image, resources, env variables and notebook.

I think more than having ray enabled Elyra image, the question is can we run one step in a pipeline that has a notebook that requires Ray? (hyperparameter tuning or distributed training or RL for example).

erikerlandson commented 3 years ago

It depends on how one wants to work. If elyra is being used only for managing pipelines, then Ray might not be very useful. If it is being used for data science explorations, and the data scientist wants ray available as backing compute, then maybe. I think a workflow where granular pipelines are created and each node in the pipline is a notebook is relevant - some such nodes might want ray but not others. A third possible modality is standing up a single ray cluster and having multiple nodes connect to it.

erikerlandson commented 3 years ago

Yet another image variation would be jupyter-lab images, as opposed with "traditional" jupyter-hub images. Currently, I'm most interested in stand-alone explorations, but if ray gets traction I would expect it to appear in larger pipelining contexts.

pacospace commented 3 years ago

It depends on how one wants to work.

Agree.

If elyra is being used only for managing pipelines, then Ray might not be very useful. If it is being used for data science explorations, and the data scientist wants ray available as backing compute, then maybe.

In theory it can be used for both, because Elyra is just an extension to Jupyterlab.

I think a workflow where granular pipelines are created and each node in the pipline is a notebook is relevant - some such nodes might want ray but not others.

A third possible modality is standing up a single ray cluster and having multiple nodes connect to it.

pacospace commented 3 years ago

It depends on how one wants to work.

Agree.

If elyra is being used only for managing pipelines, then Ray might not be very useful. If it is being used for data science explorations, and the data scientist wants ray available as backing compute, then maybe.

In theory it can be used for both, because Elyra is just an extension to Jupyterlab.

One correction here, maybe the ray as backing computing is the thing more interesting from the Elyra image point of view, you can submit a notebook with the base image you want, not only a complete AI pipeline.

I think a workflow where granular pipelines are created and each node in the pipline is a notebook is relevant - some such nodes might want ray but not others.

A third possible modality is standing up a single ray cluster and having multiple nodes connect to it.

pacospace commented 3 years ago

Related-To: https://github.com/thoth-station/core/issues/283